Agent Squad · The Workaround Stack 2026

The Workaround Stack +
cómo lo construimos hoy

SubQ propone que su 12M context elimina las capas intermedias (Vector DB, Chunking, RAG). Hoy estamos lejos de esa promesa. Análisis de qué tecnologías 2026 son las ganadoras para implementar cada capa en Agent Squad con InsForge + Vercel AI SDK ya pinneados.

Capas del stack

Tecnologías ganadoras

Fases implementación

Casos uso Agent Squad

3-5w

Effort total

Model	Context (real)	Fortaleza	Costo	Veredicto
Gemini 3.1 Pro	2M nativos · degrada >500K	Long-context + multimodal	Medio	Watch
Claude Opus 4.7 Adaptive	200K + context caching	Agentic, code, planning	Alto	Architect
Claude Sonnet 4.6	200K + caching	Balanced speed/cost	Medio	Dev
GPT-5.5	1M	Reasoning	Alto	No usar
DeepSeek V4 Pro	200K	Open weights, cost bajo	Muy bajo	Watch
SubQ 1M-Preview	1M prod · 12M research	Anti-RAG, linear scaling	Private beta	Q3 2026

Model

Context (real)

Fortaleza

Costo

Veredicto

Gemini 3.1 Pro

2M nativos · degrada >500K

Long-context + multimodal

Medio

Watch

Claude Opus 4.7 Adaptive

200K + context caching

Agentic, code, planning

Alto

Architect

Claude Sonnet 4.6

200K + caching

Balanced speed/cost

Medio

Dev

GPT-5.5

Reasoning

Alto

No usar

DeepSeek V4 Pro

200K

Open weights, cost bajo

Muy bajo

Watch

SubQ 1M-Preview

1M prod · 12M research

Anti-RAG, linear scaling

Private beta

Q3 2026

DB	Tipo	Mejor para	Veredicto Agent Squad
pgvector	Postgres extension	Teams ya en Postgres	InsForge usa esto
Qdrant	Rust standalone	Speed leader open-source	Innecesario
Pinecone	Managed cloud	Enterprise zero-ops	Otra dep
Weaviate	Open-source modular	Hybrid search nativo	Overhead
Milvus	Enterprise heavyweight	Escala >100M vectors	Overkill
LanceDB	Embedded	Edge, mobile, offline	Watch · futuro
Turbopuffer	Object storage backed	Multi-tenant SaaS	Watch · 2027

Tipo

Mejor para

Veredicto Agent Squad

pgvector

Postgres extension

Teams ya en Postgres

InsForge usa esto

Qdrant

Rust standalone

Speed leader open-source

Innecesario

Pinecone

Managed cloud

Enterprise zero-ops

Otra dep

Weaviate

Open-source modular

Hybrid search nativo

Overhead

Milvus

Enterprise heavyweight

Escala >100M vectors

Overkill

LanceDB

Embedded

Edge, mobile, offline

Watch · futuro

Turbopuffer

Object storage backed

Multi-tenant SaaS

Watch · 2027

Model	Dims	Fortaleza	$ / 1M tokens	Veredicto
OpenAI text-embedding-3-large	3072	Best overall	$0.13	Fallback
Cohere embed-v4	1536	Best multilingual + matryoshka	$0.10	Si i18n
Voyage AI voyage-3.5-large	1024	★ Best for code & technical	$0.18	Pick
Jina v4	2048	Multimodal (text+image)	$0.05	Si multimodal
Qwen3 embeddings	1024	Open-source strong	Self-host	No infra
Gemini text-embedding	768	Buen ratio cost/quality	$0.02	Backup cost

Model

Dims

Fortaleza

$ / 1M tokens

Veredicto

OpenAI text-embedding-3-large

3072

Best overall

$0.13

Fallback

Cohere embed-v4

1536

Best multilingual + matryoshka

$0.10

Si i18n

Voyage AI voyage-3.5-large

1024

★ Best for code & technical

$0.18

Pick

Jina v4

2048

Multimodal (text+image)

$0.05

Si multimodal

Qwen3 embeddings

1024

Open-source strong

Self-host

No infra

Gemini text-embedding

768

Buen ratio cost/quality

$0.02

Backup cost

Technique	Tipo	Ganancia / costo	Veredicto
Contextual Retrieval (Anthropic)	Chunking	-49% retrieval failures · header sintetizado	Pick
Late chunking (Jina)	Chunking	Preserva context entre chunks	Alt
LlamaIndex SemanticSplitterNodeParser	Chunking	Split por similitud, no tokens fijos	Pick
Agentic chunking	Chunking	LLM decide splits · caro	Premature
Cohere Rerank v3.5	Reranking	~150ms · $1/1K queries	Pick
Jina reranker v3	Reranking	~120ms · $0.5/1K queries	Backup
ColBERT v2 / late-interaction	Reranking	Top accuracy en research	Self-host

Technique

Tipo

Ganancia / costo

Veredicto

Contextual Retrieval (Anthropic)

Chunking

-49% retrieval failures · header sintetizado

Pick

Late chunking (Jina)

Chunking

Preserva context entre chunks

Alt

LlamaIndex SemanticSplitterNodeParser

Chunking

Split por similitud, no tokens fijos

Pick

Agentic chunking

Chunking

LLM decide splits · caro

Premature

Cohere Rerank v3.5

Reranking

~150ms · $1/1K queries

Pick

Jina reranker v3

Reranking

~120ms · $0.5/1K queries

Backup

ColBERT v2 / late-interaction

Reranking

Top accuracy en research

Self-host

Framework	Strength	Lang	Veredicto
LlamaIndex.TS	100+ data loaders, mejor data ingest	TypeScript	Pick para data
LangChain / LangGraph	126K stars, graph orchestration	Python + JS	Overhead
DSPy (Stanford)	Programming-first, prompts como código	Python	Premature
Haystack	Hugging Face native, enterprise	Python	Wrong lang
RAGFlow	Full-stack open-source con UI	Python	No self-host
Pathway	Streaming RAG (real-time)	Python	Si live data

Framework

Strength

Lang

Veredicto

LlamaIndex.TS

100+ data loaders, mejor data ingest

TypeScript

Pick para data

LangChain / LangGraph

126K stars, graph orchestration

Python + JS

Overhead

DSPy (Stanford)

Programming-first, prompts como código

Python

Premature

Haystack

Hugging Face native, enterprise

Python

Wrong lang

RAGFlow

Full-stack open-source con UI

Python

No self-host

Pathway

Streaming RAG (real-time)

Python

Si live data

Framework	Stars / madurez	Lang	Veredicto Agent Squad
LangGraph	126K · graph-based	Python + JS	Overhead para nuestro flow
Vercel AI SDK 6.0 ToolLoopAgent	Edge-native, provider-agnostic	JS/TS	Pick (apps/api)
Claude Agent SDK 0.1.50	Anthropic-native, modelo mix	Python	Miles (lateral)
OpenAI Agents SDK	Nuevo 2026 · production-ready	Python + JS	Vendor lock
CrewAI	80K+ · role-based	Python	Wrong lang
AutoGen / AG2 (Microsoft)	Rebuilt 2026, conversational	Python	Wrong lang
Mastra	Trending TS-first	JS/TS	Watch · 2027
Google ADK / Pydantic AI	Nuevo 2026	Python	Wrong lang

Framework

Stars / madurez

Lang

Veredicto Agent Squad

LangGraph

126K · graph-based

Python + JS

Overhead para nuestro flow

Vercel AI SDK 6.0 ToolLoopAgent

Edge-native, provider-agnostic

JS/TS

Pick (apps/api)

Claude Agent SDK 0.1.50

Anthropic-native, modelo mix

Python

Miles (lateral)

OpenAI Agents SDK

Nuevo 2026 · production-ready

Python + JS

Vendor lock

CrewAI

80K+ · role-based

Python

Wrong lang

AutoGen / AG2 (Microsoft)

Rebuilt 2026, conversational

Python

Wrong lang

Mastra

Trending TS-first

JS/TS

Watch · 2027

Google ADK / Pydantic AI

Nuevo 2026

Python

Wrong lang

Provider	Discount	TTL	Veredicto
Anthropic Claude	90% off en input cached	5 min - 1 hora	Critical · usar agresivo
Gemini	75% off	hasta 1 hora	Si migramos
OpenAI	Auto en GPT-5.5 (implícito)	Sin tier exposed	No control

Provider

Discount

TTL

Veredicto

Anthropic Claude

90% off en input cached

5 min - 1 hora

Critical · usar agresivo

Gemini

75% off

hasta 1 hora

Si migramos

OpenAI

Auto en GPT-5.5 (implícito)

Sin tier exposed

No control

Capa	Tenemos hoy	Falta	Status
L1 · Model	Claude Sonnet/Opus via SDK	Context caching agresivo	✓ Cubierto
L2 · Vector DB	InsForge `vector` (pgvector)	Schema + HNSW index	✓ Disponible
L2b · Embeddings	—	Voyage 3.5 large API key + ingestion	○ Por hacer
L3 · Chunking + Rerank	—	LlamaIndex semantic + Cohere Rerank	○ Por hacer
L4 · RAG pipeline	—	LlamaIndex.TS + custom router	○ Por hacer
L5 · Orchestration	Vercel AI SDK 6 + Claude Agent SDK	Tools custom InsForge	✓ Cubierto

Capa

Tenemos hoy

Falta

Status

L1 · Model

Claude Sonnet/Opus via SDK

Context caching agresivo

✓ Cubierto

L2 · Vector DB

InsForge vector (pgvector)

Schema + HNSW index

✓ Disponible

L2b · Embeddings

—

Voyage 3.5 large API key + ingestion

○ Por hacer

L3 · Chunking + Rerank

—

LlamaIndex semantic + Cohere Rerank

○ Por hacer

L4 · RAG pipeline

—

LlamaIndex.TS + custom router

○ Por hacer

L5 · Orchestration

Vercel AI SDK 6 + Claude Agent SDK

Tools custom InsForge

✓ Cubierto

Embeddings + Vector

1-2 semanas

Architect: schema InsForge office_documents (id, user_id, content, embedding, source_type) + HNSW index
Dev: ingest endpoint en apps/api — file → chunk semántico → Voyage embed → InsForge insert
Test: subir 1 brief técnico, recuperarlo con query "qué objetivo tiene el usuario"

Contextual + Rerank

1 semana

Architect: añadir columna chunk_context con header sintetizado por Haiku 4.5 ($0.25/M, casi gratis)
Dev: tool searchOfficeDocuments(query) — embed query → top-20 cosine → Cohere Rerank → top-5
Bench: medir recall en 10 queries de test antes/después rerank

Tool integration

1 semana

Dev: registrar tool en Vercel AI SDK ToolLoopAgent con inputSchema Zod
Test E2E: agente del usuario recibe pregunta → llama searchOfficeDocuments → responde con context
qa_worker: visual review + Playwright E2E del flujo end-to-end

Context caching

3-5 días

Dev: wirear cache_control: { type: 'ephemeral' } en system prompt + estado oficina
Bench: medir cost/turn antes vs después · target 70%+ reducción
Telemetry: trackear hit rate del cache en logs (target >80%)

The Workaround Stack +
cómo lo construimos hoy

The Workaround Stack

Tecnologías ganadoras 2026

Mapeo a Agent Squad

Stack recomendado

Plan de implementación · 4 fases

Casos de uso concretos en Agent Squad

Veredicto