AI Features and extensions pgvector pgvectorscale
David Sterling  

PostgreSQL Vector Search: From Possible to Competitive with pgvector and pgvectorscale

PostgreSQL has quietly crossed an important line in the last year: AI with Postgres is no longer a science project, it is genuinely competitive with specialized vector databases at production scale.

From “possible” to “competitive”

For a long time, the story around AI and Postgres sounded like “you can do it if you really want to.” Vector search worked, but serious teams still assumed they would end up in Pinecone, Milvus, or some other dedicated vector engine once traffic and data volume grew. Over the last 12–18 months, that assumption has flipped: between pgvector, pgvectorscale, and better cloud support, Postgres is now a first-class option for high‑throughput, low‑latency vector workloads, not just a convenient starting point.

That shift matters because it lets engineering teams keep AI retrieval, transactional data, and application logic in one database, instead of stitching together a patchwork of services and operational failure modes. For a lot of shops, that translates directly into less infrastructure to run, fewer edge cases in production, and a simpler mental model for developers.

Pgvector as the default vector layer

Pgvector is the foundation that made “AI inside Postgres” feel natural rather than bolted on. It adds a native vector type plus operators and indexes for similarity search, so embeddings from text, images, or audio can live alongside your existing relational schema. You can store OpenAI, Anthropic, or local model embeddings as simple columns, index them, and query with L2, inner product, or cosine distance using familiar SQL.

Because pgvector ships in Neon, Supabase, Cloud SQL, Heroku, and other managed offerings, it has effectively become the de facto standard for vector search in Postgres. That ubiquity is what unlocked the broader ecosystem: client libraries, migration guides, RAG templates, and dashboard integrations all assume that “Postgres vector search” means pgvector.

The scaling problem: big embeddings, bigger datasets

Once teams moved past toy demos, the pain points were predictable: embeddings are big, datasets grow fast, and RAM is expensive. A few patterns kept showing up:

  • Billion-scale or near-billion-scale embeddings quickly pushed HNSW and IVFFlat indexes into “terabytes of RAM” territory.
  • Latency targets in the tens of milliseconds clashed with indexes that either sat on slower disks or required aggressive caching.
  • Multi-tenant and filtered search (“only within this customer, region, or tag set”) exposed weaknesses in naïve approximate nearest neighbor (ANN) implementations that were not built around complex filters.

What pgvectorscale actually addsYou could get good performance with pgvector alone, but it often required over-provisioning memory or accepting compromises on recall, latency, or cost. That gap is exactly where Timescale’s pgvectorscale enters.

Pgvectorscale is a companion extension to pgvector that focuses on two things: better indexing at large scale and more efficient storage. It introduces a StreamingDiskANN index inspired by Microsoft’s DiskANN algorithm and a novel compression technique called Statistical Binary Quantization (SBQ), both designed specifically for disk-resident vector search.

StreamingDiskANN stores most of the index on disk and pulls only the relevant graph segments into memory on demand, which lets Postgres handle massive embedding sets without needing to keep the entire structure in RAM. SBQ, on the other hand, compresses vectors into a binary representation that preserves distance relationships well enough for high-recall ANN while dramatically shrinking the storage footprint. Together, they shift the scaling curve from “RAM-bound” to “SSD-bound,” which is exactly what you want as vector workloads grow.

The Pinecone comparison in real numbers

Timescale’s public benchmarks are where the story goes from “interesting” to “this changes architecture decisions.” On a 50-million embedding dataset (Cohere, 768 dimensions), PostgreSQL with pgvector plus pgvectorscale was measured against Pinecone’s storage-optimized (s1) and performance-optimized (p2) indexes.

The s1 comparison is blunt: Postgres with pgvector + pgvectorscale achieved roughly 28x lower p95 latency and 16x higher query throughput than Pinecone s1 for approximate nearest neighbor queries at 99% recall, while costing about 75% less per month when self-hosted on AWS. Against Pinecone’s performance-optimized p2 tier, Postgres still delivered about 1.4x lower p95 latency and 1.5x higher throughput at 90% recall, at roughly 21% of the monthly cost.

These aren’t edge-case microbenchmarks. They reflect realistic RAG-style workloads with large embedding sets, production-grade recall targets, and cost profiles that matter to engineering leaders. The more your dataset grows, the more compelling it becomes to keep vector search inside Postgres rather than paying a memory-heavy premium for a separate vector database.

Why this changes system design

The performance numbers only tell part of the story. The other part is architectural simplicity. When vector search lives inside Postgres:

  • You keep ACID transactions, point-in-time recovery, and standard Postgres backups across both structured and vector data.
  • You can join vectors directly with relational tables and JSONB documents instead of synchronizing them across two or three backends.
  • You can filter vector search with the same WHERE clauses you already use—tenant IDs, tags, feature flags—because StreamingDiskANN is explicitly designed for “streaming filtering” scenarios that combine similarity with secondary predicates.

In practice, that means simpler ETL, fewer race conditions between systems, and a smaller operational surface area. For example, a multi-tenant SaaS can store tenant metadata as JSONB, embeddings as pgvector columns, and then run filtered ANN queries that respect tenant boundaries and feature flags in a single SQL statement.

JSON, transactions, and AI inside Postgres

Postgres’ JSONB support is an underrated piece of this puzzle. A typical RAG system needs to capture not just raw text and embeddings, but also metadata like document type, tags, access control, and enrichment signals. JSONB is a natural home for that metadata, and combining JSONB with GIN indexes, generated columns, and vector indexes lets you build rich retrieval pipelines without bolting on a document store.

Add in transactional guarantees—embedding inserts, metadata updates, and content changes can all commit or roll back together—and you end up with a very different failure profile than a split Postgres + external vector database architecture. For teams that have already standardized on Postgres for core data, that consistency is hard to beat.

Where pgvector + pgvectorscale fit today

None of this means “never use a dedicated vector database.” If your organization already runs Pinecone at scale and your application is deeply integrated with its ecosystem, the migration calculus is different. But for new greenfield builds or teams already deep in Postgres, the default has flipped: start in Postgres, and only move out if you have a very specific reason.

The heaviest vector workloads—multi-billion embeddings, extreme tail latency requirements, or exotic hardware—will keep pushing the ecosystem forward, and not every use case will land on the same answer. What has changed is that PostgreSQL with pgvector and pgvectorscale is no longer a compromise; it is a competitive, cost-efficient, and operationally sane way to run serious AI workloads, with the full power of SQL and JSON sitting right next to your embeddings.

Leave A Comment