Deep Dive March 20, 2026 11 min read

pgvector vs. Pinecone: Choosing the Right Vector Store for Your AI Chatbot

We compare managed vector databases against PostgreSQL's pgvector extension for RAG-powered chatbot applications — with real benchmark numbers from production.

The SellyChat Team SellyChat

The vector store decision

If you're building a RAG-powered chatbot — one that retrieves relevant context from a knowledge base before generating a response — you need a vector store. The two most common choices in 2026 are pgvector (a PostgreSQL extension) and Pinecone (a managed vector database service).

I've tested both while building SellyChat. This article shares what I learned, including benchmark numbers, operational trade-offs, and the reasoning behind the decision to standardize on pgvector.

What each option actually is

pgvector

pgvector is an open-source PostgreSQL extension that adds vector similarity search to your existing Postgres database. You store embeddings as a native column type, create an index (IVFFlat or HNSW), and query with standard SQL using distance operators like <-> (L2), <=> (cosine), or <#> (inner product).

The key advantage: your vectors live alongside your relational data. No separate service, no data synchronization, no additional infrastructure to manage.

Pinecone

Pinecone is a fully managed vector database. You send vectors via API, Pinecone handles indexing, sharding, replication, and scaling. It supports metadata filtering, namespaces, and serverless pricing on their newer tier.

The key advantage: zero operational overhead. You don't manage indexes, tune parameters, or worry about scaling. It just works — as long as you're comfortable with the pricing and the vendor dependency.

Benchmark: query latency

We benchmarked both options using a dataset of 500,000 document chunks (1536-dimension embeddings from OpenAI's text-embedding-3-small model) on comparable infrastructure.

pgvector (HNSW index, ef_search=40): p50 latency 8ms, p99 latency 23ms
Pinecone (Serverless, us-east-1): p50 latency 35ms, p99 latency 85ms
Pinecone (Pod-based, p1.x1): p50 latency 12ms, p99 latency 32ms

pgvector with HNSW is consistently faster for our workload. The difference is primarily network overhead — pgvector queries stay within the same database connection your application already uses, while Pinecone requires an external API call.

For a chatbot, this matters. Every millisecond in the retrieval step adds to the total response latency the user experiences. At 500K chunks, pgvector's 8ms p50 is effectively free in the context of a 200–400ms LLM call.

Benchmark: recall accuracy

Both options achieve comparable recall when properly configured:

pgvector (HNSW, m=16, ef_construction=64): 98.2% recall@10
Pinecone: 99.1% recall@10

Pinecone edges ahead slightly on recall, but the difference is negligible for chatbot applications where you're typically retrieving the top 3–5 chunks and feeding them to an LLM that's tolerant of minor relevance variations.

Cost comparison

This is where the decision gets interesting.

pgvector: Free (it's a Postgres extension). Your only cost is the database instance you're already running. Adding vector columns and HNSW indexes increases storage and memory usage, but for most chatbot workloads (under 1M chunks), the incremental cost is negligible.
Pinecone Serverless: Starts at $0.00/month for low usage, but scales to $70–200/month for 500K+ vectors with moderate query volume.
Pinecone Pods: Starts at ~$70/month for a single p1.x1 pod. Production deployments with replicas run $200–500/month.

For a SaaS platform like SellyChat that manages multiple knowledge bases across customer accounts, the cost difference is significant. pgvector lets us store all customer vectors in the same Postgres database we already use for everything else, with zero additional infrastructure cost.

Operational complexity

This is Pinecone's strongest argument. With pgvector, you're responsible for:

Choosing the right index type (IVFFlat vs. HNSW) and tuning parameters
Monitoring index build times and memory usage
Handling reindexing when you change embedding models
Scaling your Postgres instance as vector data grows

With Pinecone, all of that is abstracted away. You call an API, vectors get indexed, queries return results. For teams without strong database expertise, this is a legitimate advantage.

That said, if you're already running PostgreSQL (and most web applications are), adding pgvector is a single CREATE EXTENSION command. The HNSW index type, introduced in pgvector 0.5.0, is the default recommendation and requires minimal tuning for most workloads.

Why we chose pgvector for SellyChat

Our decision came down to three factors:

Colocation. Keeping vectors in the same database as our relational data eliminates synchronization complexity. When a customer updates their knowledge base, the vector update happens in the same transaction as the metadata update.
Latency. In-process queries are faster than external API calls. For a real-time chatbot, every millisecond counts.
Cost at scale. As the platform grows, each customer has their own knowledge base. A managed vector database would add significant per-customer cost that would have to be passed on in pricing.

Pinecone is a great product. If you're building a standalone AI application, don't already run Postgres, and want zero operational overhead, it's a solid choice. But for a multi-tenant SaaS platform where Postgres is already the backbone, pgvector is the clear winner.

Getting started with pgvector

If you're building your own RAG pipeline, here's the minimal setup:

Install the extension: CREATE EXTENSION vector;
Add a vector column: ALTER TABLE documents ADD COLUMN embedding vector(1536);
Create an HNSW index: CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
Query: SELECT * FROM documents ORDER BY embedding <=> $1 LIMIT 5;

If you'd rather skip the infrastructure entirely and just use a knowledge base that works out of the box, SellyChat handles all of this for you — upload your content and the AI agent takes care of the rest.

Have questions about vector search or knowledge base architecture? We're happy to chat.