pgvector + Aurora PostgreSQL for embeddings: why I picked it (and how I use it)

Every AI application that outgrows the toy phase eventually hits the same wall:

where do I store embeddings and how do I run similarity search without wrecking latency and cost.

The industry answered that with a wave of dedicated vector databases — Pinecone, Weaviate, Qdrant, Milvus, Chroma. Each one with its own pitch, its own SDK, and its own line on next month's bill.

When I had to solve this for real in an agentic product — storing content embeddings, running RAG, filtering by metadata, living next to relational transactions — I didn't reach for any of them.

I went with Aurora PostgreSQL + pgvector.

This post is the reasoning behind that decision and a hands-on implementation guide with Drizzle.

The problem, plainly

AI applications usually have three classes of data sitting side by side:

Classic relational data — users, projects, permissions, billing.
Embeddings — 768/1024/1536/3072-dimension vectors representing chunks of text, code, or images.
Metadata — tags, author, language, source type, timestamps, quality scores.

The actually interesting query is almost never "give me the top 10 nearest vectors." It's:

"Find the 10 chunks most similar to this question, in project X, in English, created in the last 90 days, excluding the ones this user already saw."

That's similarity search plus relational filters plus joins. In a dedicated vector DB this turns into gymnastics: you duplicate metadata into payloads, keep two systems in sync, pay bandwidth to ship IDs back so the app layer can do the filtering.

In Postgres, that's a single query.

Options on the table

Honest summary of the alternatives I considered:

Pinecone — managed, fast, great DX. But it's one more SaaS, priced per dimension/vector, with zero relationship to the rest of your schema. You'll duplicate metadata and write sync code.
Weaviate / Qdrant — open-source, performant, with solid filtering. Running them yourself means one more cluster to babysit. Managed options exist; that's just another vendor.
Chroma / Milvus — great for local dev and prototypes; in serious production they hit the same trade-offs above.
pgvector inside Postgres — open-source extension maintained by Andrew Kane (same author of pg_search and others). C plugin, performant, with HNSW and IVFFlat indexes and L2, inner product, cosine, Hamming, and Jaccard distances.

The real question isn't "which is fastest on a synthetic benchmark." It's which system causes the least operational friction for my case.

Why Aurora + pgvector won

Four reasons, in order of weight:

1. A single source of truth. Embeddings, metadata, product data — all in one database. Real joins, ACID transactions, foreign keys, RLS if you want it. No sync job. No drift between two systems.

2. Aurora handles the ops. Auto-scaling storage, multi-AZ read replicas, snapshots, point-in-time recovery, managed failover. I don't need to operate a separate vector cluster. The data team already knows Postgres.

3. pgvector is mature. The HNSW index (Hierarchical Navigable Small World) gave me recall and latency comparable to dedicated vector DBs on my workload (~1M vectors at 1536 dimensions). And pgvector has been officially supported on Aurora since PostgreSQL 15.

4. Predictable cost. I'm already paying for Aurora. There's no extra line item that scales with vector count.

When NOT to go this route

To be fair, pgvector is not the best fit if:

You have billions of vectors and the vector store is your core product — you'll need something more specialized (Vespa, Milvus in cluster, Pinecone enterprise).
Your workload is 99% similarity search with no relational component — then Postgres overhead doesn't pay off.
You need aggressive pre-filter search at extreme throughput — some dedicated vector DBs optimize that better than the Postgres planner.

For most agentic products — RAG over your own content, agent memory, semantic search across a catalog — pgvector is more than enough.

Setup on Aurora

Aurora PostgreSQL ships pgvector pre-installed starting at version 15.2. You just enable it:

CREATE EXTENSION IF NOT EXISTS vector;

Verify the version (ideally 0.7.x or newer, which brings mature HNSW and halfvec half-precision support):

SELECT extversion FROM pg_extension WHERE extname = 'vector';

Schema with Drizzle

Drizzle gained first-class vector support in mid-2024. It feels natural:

// db/schema.ts
import { pgTable, uuid, text, timestamp, jsonb, vector, index } from "drizzle-orm/pg-core";
 
export const documents = pgTable(
  "documents",
  {
    id: uuid("id").primaryKey().defaultRandom(),
    projectId: uuid("project_id").notNull(),
    source: text("source").notNull(),
    chunk: text("chunk").notNull(),
    metadata: jsonb("metadata").$type<Record<string, unknown>>().default({}),
    embedding: vector("embedding", { dimensions: 1536 }),
    createdAt: timestamp("created_at").defaultNow().notNull(),
  },
  (t) => ({
    embeddingIdx: index("documents_embedding_idx")
      .using("hnsw", t.embedding.op("vector_cosine_ops"))
      .with({ m: 16, ef_construction: 64 }),
    projectIdx: index("documents_project_idx").on(t.projectId),
  }),
);

Key points:

dimensions: 1536 matches OpenAI's text-embedding-3-small. For text-embedding-3-large use 3072. For voyage-3 use 1024.
The index is HNSW with vector_cosine_ops because most embedding models normalize vectors and cosine is the expected metric. For inner product use vector_ip_ops, for L2 use vector_l2_ops.
m=16 and ef_construction=64 are good defaults. Raising m improves recall at the cost of index memory; raising ef_construction improves build quality at the cost of build time.

Generating embeddings (AI SDK)

// lib/embed.ts
import { embed, embedMany } from "ai";
import { openai } from "@ai-sdk/openai";
 
export async function embedOne(text: string) {
  const { embedding } = await embed({
    model: openai.embedding("text-embedding-3-small"),
    value: text,
  });
  return embedding;
}
 
export async function embedBatch(texts: string[]) {
  const { embeddings } = await embedMany({
    model: openai.embedding("text-embedding-3-small"),
    values: texts,
  });
  return embeddings;
}

In production, prefer embedMany in batches of 96–128 to amortize HTTP latency.

Inserting

import { db } from "./db";
import { documents } from "./db/schema";
import { embedBatch } from "./lib/embed";
 
export async function ingest(
  projectId: string,
  source: string,
  chunks: string[],
) {
  const embeddings = await embedBatch(chunks);
 
  await db.insert(documents).values(
    chunks.map((chunk, i) => ({
      projectId,
      source,
      chunk,
      embedding: embeddings[i],
    })),
  );
}

Query: similarity + relational filter

This is where having everything in Postgres pays off.

import { sql, and, eq, gte } from "drizzle-orm";
 
export async function search(opts: {
  projectId: string;
  query: string;
  language: string;
  sinceDays?: number;
  k?: number;
}) {
  const queryEmbedding = await embedOne(opts.query);
  const k = opts.k ?? 10;
  const since = new Date(Date.now() - (opts.sinceDays ?? 90) * 86_400_000);
 
  const distance = sql<number>`${documents.embedding} <=> ${queryEmbedding}`;
 
  return db
    .select({
      id: documents.id,
      chunk: documents.chunk,
      source: documents.source,
      distance,
    })
    .from(documents)
    .where(
      and(
        eq(documents.projectId, opts.projectId),
        sql`${documents.metadata}->>'language' = ${opts.language}`,
        gte(documents.createdAt, since),
      ),
    )
    .orderBy(distance)
    .limit(k);
}

The <=> operator is cosine distance in pgvector (<-> is L2, <#> is negative inner product). Because the index was built with vector_cosine_ops, the planner will use it on the ORDER BY.

Runtime tuning worth its weight in gold:

SET hnsw.ef_search = 100;

This parameter controls the recall/latency trade-off at query time. Default is 40. Bumping it to 100–200 improves recall significantly with modest latency cost.

What about Prisma?

Prisma still lacks first-class vector support, but you can use it via Unsupported in the schema and $queryRaw in the query:

model Document {
  id        String   @id @default(uuid()) @db.Uuid
  projectId String   @db.Uuid
  chunk     String
  embedding Unsupported("vector(1536)")
  createdAt DateTime @default(now())
}

const results = await prisma.$queryRaw<Array<{ id: string; chunk: string; distance: number }>>`
  SELECT id, chunk, embedding <=> ${queryEmbedding}::vector AS distance
  FROM "Document"
  WHERE "projectId" = ${projectId}::uuid
  ORDER BY distance
  LIMIT ${k};
`;

It works, but you lose type safety inside the vector and the index has to be created via manual migration. That's why Drizzle ended up the default.

Gotchas I learned the hard way

Create the HNSW index AFTER loading data for big ingests. Building the index is expensive; loading with the index already in place is much slower.
Use halfvec if latency/memory get tight — pgvector 0.7 introduced half-precision (16-bit) vectors that cut index memory in half with negligible recall loss for most cases.
Don't mix embedding models. Vectors from different models are not comparable. If you change models, reindex everything.
Beware of giant chunks. Embeddings lose fidelity above ~500 tokens. Use chunking with overlap (~50 tokens) and keep chunks semantically coherent.
For high scale, move the index to an Aurora read replica and route similarity queries there. Your primary's write workload won't feel it.

Closing

The stack decision for embeddings isn't purely technical — it's about how many systems you want to operate.

Every extra service is one more SDK, one more SLA, one more bill, one more mental model, one more failure mode. Postgres with pgvector folds that into a single place, with performance good enough for the overwhelming majority of agentic products today.

The rule I follow:

Start with pgvector. Move off it only when you have concrete evidence it no longer fits.

In practice, that moment takes a lot longer to arrive than the dedicated-vector-DB hype suggests.