There's a massive gap between the tools people demo on Twitter and the tools that survive production. After shipping 4 AI products while working a full-time banking job (15-20 hours/week available), I've become ruthless about what stays in my stack.
If it doesn't save me measurable time, it's gone. Here's what survived.
The Complete Stack (March 2026)
My Production AI Stack
Development
- IDE: Cursor + Claude Code (primary)
- Language: TypeScript (90%), Python (10%)
- Framework: Next.js 16 (frontend + API)
- Backend: Node.js + Express (when needed)
- Version control: Git + GitHub
AI / LLM
- Primary: Claude 3.5 Sonnet (complex reasoning)
- Fast: GPT-4o-mini (extraction, classification)
- Grounded: Gemini (real-time web data)
- Embeddings: text-embedding-3-large (OpenAI)
- Gateway: Custom multi-provider router
Data
- Database: PostgreSQL 15 (Cloud SQL)
- Vector: pgvector (same Postgres instance)
- Cache: Redis (query dedup, sessions)
- Queue: GCP Pub/Sub (async jobs)
Infrastructure
- Compute: GCP Cloud Run (containers, auto-scale)
- Container: Docker (multi-stage builds)
- DNS/CDN: Cloudflare
- Email: Resend (transactional)
- Monitoring: Custom + Cloud Run logs
Monthly Cost (all 4 products)
| Service | Cost |
|---|---|
| Cloud Run | ~$15–25 |
| Cloud SQL | ~$30–40 |
| LLM APIs | ~$20–50 (depends on usage) |
| Redis | ~$10 |
| Domains | ~$3 |
| Total | ~$80–130/month |
$80-130/month for 4 production AI systems. Not bad.
Why Each Tool Won Its Slot
Claude Code — The 5x Multiplier
This is the single biggest force multiplier. It's not a chatbot — it's an AI agent that reads your entire codebase, understands architecture patterns, and implements changes across multiple files consistently.
What it actually does for me:
| Task | Without Claude Code | With Claude Code |
|---|---|---|
| New API endpoint | 45–60 min | 10–15 min |
| Database migration + types | 30–45 min | 5–10 min |
| React component with API integration | 60–90 min | 15–25 min |
| Total for a typical feature | 3–4 hours | 45–75 min |
Net savings: ~2.5 hours per feature. At 3–4 features/week: 8–10 hours saved. That's basically my entire available work week.
The skill isn't prompting. It's architecture judgment, context curation, and code review discipline. I wrote about this in detail in my Claude Code article.
PostgreSQL + pgvector — One Database to Rule Them All
I run everything on PostgreSQL. App data, vector embeddings, sessions, job queues — all in one database.
My Database Philosophy
| What most people do | What I do | |
|---|---|---|
| App data | PostgreSQL | PostgreSQL |
| Vectors | Pinecone | PostgreSQL (pgvector) |
| Cache | Redis | PostgreSQL (+ Redis for hot cache only) |
| Queue | RabbitMQ | PostgreSQL (+ Pub/Sub for async only) |
| Sessions | Separate store | PostgreSQL |
| Services | 4 | 1 (+ 2 optional) |
| Failure points | 4 | 1 |
| Bills | 4 | 1 |
| Ops complexity | HIGH | LOW |
pgvector handles my RAG needs perfectly. HNSW indexes give sub-100ms similarity search on datasets under 10M vectors. That covers 99% of enterprise use cases.
Why not Pinecone? I already pay for Cloud SQL. pgvector is a Postgres extension — zero additional cost. And I don't need another vendor's dashboard, another API key, another point of failure.
Multi-Provider LLM — The Right Tool Per Job
Provider loyalty is expensive. Here's my actual routing logic:
| Task | Provider | Why |
|---|---|---|
| Complex reasoning | Claude 3.5 Sonnet | Best instruction-following |
| Simple extraction | GPT-4o-mini | 10x cheaper, fast enough |
| Real-time web data | Gemini | Native Google Search grounding |
| Embeddings | OpenAI | Best quality/cost ratio |
| Voice transcription | Whisper | Best accuracy |
Cost impact:
| Approach | Monthly Cost |
|---|---|
| Claude for everything | ~$200/month |
| With smart routing | ~$30–50/month |
| Savings | 75%+ |
GCP Cloud Run — The Underrated Platform
Cloud Run is what I recommend to anyone running side projects or early-stage products. Here's why:
Cloud Run vs Alternatives
| Feature | Cloud Run | Vercel | AWS Lambda |
|---|---|---|---|
| Pricing | Per-request | Per-request | Per-request |
| Idle cost | $0 | $0* | $0 |
| Container | Full Docker | Serverless | Serverless |
| Background | Yes (workers) | No | Step Functions |
| WebSocket | Yes | Limited | API Gateway |
| Custom domain | Yes | Yes | Complex |
| Deploy speed | ~3 min | ~30 sec | ~2 min |
| Lock-in | Low | High | Very High |
* Vercel gets expensive fast with API-heavy apps (bandwidth + function invocations add up)
I deploy a Docker container and Cloud Run handles scaling, SSL, and custom domains. My side projects cost under $20/month to run. When Premier Radar gets real traffic, Cloud Run auto-scales — I don't need to change anything.
What I Stopped Using (And Why)
| Removed Tool | Reason |
|---|---|
| LangChain | Too much abstraction for what it does. My RAG pipeline is ~200 lines of TypeScript. Cleaner, debuggable, no dependency bloat. |
| Pinecone | Unnecessary with pgvector. One less vendor. |
| Vercel (for API apps) | Cost model doesn't work for API-heavy projects. $20/month became $80/month quickly. |
| Jupyter notebooks (prod) | Great for exploration. Terrible for production. Everything goes into proper TS/Python modules. |
| MongoDB | Switched to PostgreSQL. Relational + vector in one database beats two separate systems. |
| Supabase | Good product, but I want full Postgres control. Cloud SQL gives me that. |
The Principle Behind It All
Every tool I cut follows the same logic:
Decision Framework
| Question | If YES | If NO |
|---|---|---|
| Does this tool solve a problem I actually have? | Continue below | Remove it |
| Can I solve this with a tool I already use? | Use the existing tool | Continue below |
| Is the added complexity worth the benefit? | Add the tool | Build a simple alternative (~200 lines max) |
Most developers do the opposite: they start with tools and look for problems to solve. I start with problems and find the simplest tool that solves them.
After shipping 4 production AI systems on ~$100/month, I'm pretty confident this approach works.
Ready to move AI from pilot to production?
15 minutes to diagnose what's blocking your AI initiative. No pitch — just a conversation.
Book a 15-min diagnostic call