Skip to main content

Pinecone vs Qdrant vs Weaviate 2026

·APIScout Team
Share:

TL;DR

Qdrant for performance-critical production workloads — Rust-based, 20ms p95 latency, 15K QPS, and the best payload filtering in the category. Pinecone for teams who want zero database operations — fully managed, consistent performance, and the simplest API. Weaviate for hybrid search (vector + keyword BM25) — its native BM25 integration and GraphQL API make it the best choice when you need both semantic and keyword search in the same index. Self-hosting Weaviate or Qdrant saves 60–70% versus Pinecone at scale.

Key Takeaways

  • Qdrant: 20ms p95 latency, 15K QPS, Rust-based, best payload filtering, self-host or managed cloud
  • Pinecone: 50ms p95 latency, 10K QPS, serverless ($0.33/GB/month), zero infrastructure, SOC 2 + ISO 27001 + HIPAA
  • Weaviate: 30ms p95 latency, 5K QPS, best hybrid search, GraphQL API, module ecosystem (vectorizers, generative)
  • Cost at scale (1B vectors): Pinecone ~$3,500/month managed; Weaviate Cloud ~$2,200/month; Qdrant Cloud ~$1,000/month; self-hosted ~$800/month
  • pgvector: For Postgres shops with <10M vectors and <100 QPS — free with your existing database

The Vector Database Landscape in 2026

Vector databases store high-dimensional embeddings (typically 1536-dimensional for OpenAI text-embedding-3-small) and perform approximate nearest-neighbor (ANN) search. The choice matters at scale — at 100M vectors, the difference between a well-optimized database and a poorly-chosen one is 10x on cost and 5x on latency.

The 2026 landscape has three main segments:

Segment 1: Managed simplicity
  Pinecone — zero infrastructure, serverless, best for teams without MLOps

Segment 2: Self-hosted performance
  Qdrant — Rust performance, on-prem or cloud
  Weaviate — feature-rich, strong hybrid search

Segment 3: Embedded/lightweight
  Chroma — local dev, prototyping
  LanceDB — edge deployment
  pgvector — Postgres extension for small-medium workloads

Pinecone

Getting Started

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create a serverless index
pc.create_index(
    name="rag-documents",
    dimension=1536,  # Match your embedding model dimensions
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1",
    ),
)

# Connect to index
index = pc.Index("rag-documents")

Upsert and Query

from openai import OpenAI

openai_client = OpenAI()

def get_embedding(text: str) -> list[float]:
    response = openai_client.embeddings.create(
        input=text,
        model="text-embedding-3-small",
    )
    return response.data[0].embedding

# Upsert documents
vectors = []
for doc in documents:
    embedding = get_embedding(doc.content)
    vectors.append({
        "id": doc.id,
        "values": embedding,
        "metadata": {
            "text": doc.content,
            "source": doc.source,
            "created_at": doc.created_at.isoformat(),
        },
    })

# Batch upsert (max 100 vectors per call)
for i in range(0, len(vectors), 100):
    index.upsert(vectors=vectors[i:i+100])

# Query
query_embedding = get_embedding("What are the payment terms?")

results = index.query(
    vector=query_embedding,
    top_k=5,
    include_metadata=True,
    # Filter by metadata
    filter={
        "source": {"$eq": "contracts"},
        "created_at": {"$gte": "2025-01-01"},
    },
)

for match in results.matches:
    print(f"Score: {match.score:.4f}")
    print(f"Text: {match.metadata['text'][:200]}")
    print("---")

Namespaces for Multi-Tenancy

# Pinecone namespaces for tenant isolation
index.upsert(
    vectors=vectors,
    namespace=f"tenant-{tenant_id}",  # Isolated per tenant
)

results = index.query(
    vector=query_embedding,
    top_k=5,
    namespace=f"tenant-{tenant_id}",
    include_metadata=True,
)

Serverless Pricing (2026)

Pinecone Serverless:
  Storage:  $0.33/GB/month
  Writes:   $2.00/million write units
  Reads:    $4.00/million read units

Approximate at 1M documents (1536-dim, float32):
  Storage:  ~6GB → ~$2/month
  Monthly queries (100K/day): ~$12/month
  Monthly ingestion (100K docs): ~$0.20
  Total: ~$14/month at low volume

At 100M documents, 1M queries/day:
  Storage:  ~600GB → $198/month
  Queries:  30M/month → $120/month
  Total:    ~$318/month (light usage at scale)
  Performance tier: scales linearly

Qdrant

Qdrant is the performance leader — written in Rust, it handles complex payload filtering without sacrificing search speed.

Getting Started

from qdrant_client import QdrantClient, models

# Self-hosted (Docker)
client = QdrantClient(host="localhost", port=6333)

# Qdrant Cloud
client = QdrantClient(
    url="https://your-cluster-url.qdrant.io",
    api_key=os.environ["QDRANT_API_KEY"],
)

# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=models.VectorParams(
        size=1536,
        distance=models.Distance.COSINE,
        on_disk=True,  # For large collections
    ),
    # HNSW configuration for performance tuning
    hnsw_config=models.HnswConfigDiff(
        m=16,              # Higher = better recall, more memory
        ef_construct=100,  # Higher = slower indexing, better quality
    ),
)

Upsert with Rich Payloads

from qdrant_client.models import PointStruct

# Qdrant uses "points" with payloads (equivalent to metadata)
points = [
    PointStruct(
        id=i,  # Integer or UUID
        vector=embedding,
        payload={
            "text": doc.content,
            "source": doc.source,
            "department": doc.department,
            "access_level": doc.access_level,
            "created_at": doc.created_at.timestamp(),
            "word_count": len(doc.content.split()),
        },
    )
    for i, (doc, embedding) in enumerate(zip(documents, embeddings))
]

client.upsert(
    collection_name="documents",
    points=points,
)

Advanced Filtering (Qdrant's Strength)

Qdrant's payload filtering is the most expressive in the category:

from qdrant_client.models import Filter, FieldCondition, Range, MatchValue, MatchAny

# Complex filter: department = legal AND access_level >= 2 AND recent
results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="department",
                match=MatchValue(value="legal"),
            ),
            FieldCondition(
                key="access_level",
                range=Range(gte=2),
            ),
            FieldCondition(
                key="created_at",
                range=Range(
                    gte=datetime(2025, 1, 1).timestamp(),
                ),
            ),
        ],
        should=[
            FieldCondition(
                key="source",
                match=MatchAny(any=["contracts", "agreements"]),
            ),
        ],
    ),
    limit=10,
    with_payload=True,
    with_vectors=False,  # Don't return vectors (saves bandwidth)
)

for result in results:
    print(f"Score: {result.score:.4f} | Source: {result.payload['source']}")
    print(result.payload['text'][:200])

Hybrid Search with Sparse Vectors

from qdrant_client.models import SparseVector, SparseVectorParams

# Create collection with both dense and sparse vectors
client.create_collection(
    collection_name="hybrid_docs",
    vectors_config={
        "dense": models.VectorParams(size=1536, distance=models.Distance.COSINE),
    },
    sparse_vectors_config={
        "sparse": SparseVectorParams(
            index=models.SparseIndexParams(on_disk=True),
        ),
    },
)

# Query with both (RRF fusion)
from qdrant_client.models import SparseVector, NamedSparseVector, NamedVector

results = client.query_points(
    collection_name="hybrid_docs",
    prefetch=[
        models.Prefetch(
            query=NamedVector(name="dense", vector=dense_embedding),
            limit=20,
        ),
        models.Prefetch(
            query=NamedSparseVector(
                name="sparse",
                vector=SparseVector(
                    indices=sparse_indices,
                    values=sparse_values,
                ),
            ),
            limit=20,
        ),
    ],
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    limit=5,
)

Weaviate

Weaviate is the hybrid search specialist. Its native BM25 index means you don't need a separate Elasticsearch instance for keyword search.

Getting Started

import weaviate
import weaviate.classes as wvc

# Connect to Weaviate Cloud (WCS)
client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.environ["WEAVIATE_URL"],
    auth_credentials=weaviate.auth.AuthApiKey(
        os.environ["WEAVIATE_API_KEY"]
    ),
)

# Create collection
client.collections.create(
    name="Document",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(
        model="text-embedding-3-small",
    ),
    generative_config=wvc.config.Configure.Generative.openai(
        model="gpt-4o",
    ),
    properties=[
        wvc.config.Property(
            name="content",
            data_type=wvc.config.DataType.TEXT,
        ),
        wvc.config.Property(
            name="source",
            data_type=wvc.config.DataType.TEXT,
            skip_vectorization=True,
        ),
        wvc.config.Property(
            name="department",
            data_type=wvc.config.DataType.TEXT,
            skip_vectorization=True,
        ),
    ],
)

Hybrid Search (Weaviate's Strength)

documents = client.collections.get("Document")

# Pure vector search
vector_results = documents.query.near_text(
    query="payment terms and conditions",
    limit=5,
    return_metadata=wvc.query.MetadataQuery(distance=True),
)

# Pure keyword search (BM25)
keyword_results = documents.query.bm25(
    query="payment terms NET30",
    limit=5,
    return_metadata=wvc.query.MetadataQuery(score=True),
)

# Hybrid search (vector + BM25 combined) — Weaviate's signature feature
hybrid_results = documents.query.hybrid(
    query="payment terms NET30",
    alpha=0.5,  # 0 = pure BM25, 1 = pure vector, 0.5 = balanced
    limit=5,
    filters=wvc.query.Filter.by_property("department").equal("legal"),
    return_metadata=wvc.query.MetadataQuery(score=True, explain_score=True),
)

for result in hybrid_results.objects:
    print(f"Score: {result.metadata.score:.4f}")
    print(f"Content: {result.properties['content'][:200]}")

Generative Search (RAG in One Query)

Weaviate's generative modules run the LLM call inside the database:

# Single query: search + generate response
response = documents.generate.hybrid(
    query="What are the payment terms in our enterprise contracts?",
    alpha=0.5,
    limit=3,
    # RAG: generate a response using the retrieved documents
    grouped_task="Summarize the payment terms found in these documents. "
                 "Format as a bullet list with key terms highlighted.",
)

print(response.generated)  # LLM-generated summary
for obj in response.objects:
    print(f"Source: {obj.properties['source']}")

Performance Comparison

Benchmark setup: 100M vectors, 1536 dimensions, 10% payload filter

Latency (p50 / p95 / p99):
  Pinecone:  12ms / 50ms / 85ms
  Qdrant:     8ms / 20ms / 35ms
  Weaviate:  10ms / 30ms / 55ms

Throughput (concurrent requests):
  Pinecone:  10,000 QPS (managed, auto-scales)
  Qdrant:    15,000 QPS (self-hosted, 32-core)
  Weaviate:   5,000 QPS (self-hosted, 32-core)

With complex payload filter (3 conditions):
  Pinecone:  +8ms latency overhead (metadata index)
  Qdrant:    +2ms latency overhead (native HNSW+filter)
  Weaviate:  +5ms latency overhead

Qdrant's HNSW+filter implementation is the most efficient —
payload filtering runs during graph traversal, not as post-filter.

Cost Comparison at Scale

ScalePinecone CloudWeaviate CloudQdrant CloudSelf-hosted
1M vectors~$14/month~$45/month~$20/month~$15/month
10M vectors~$50/month~$120/month~$60/month~$50/month
100M vectors~$350/month~$800/month~$400/month~$200/month
1B vectors~$3,500/month~$2,200/month~$1,000/month~$800/month

Estimates based on 1536-dim vectors, moderate query volume (100K queries/day), 2026 pricing


Feature Comparison

FeaturePineconeQdrantWeaviate
Managed cloud✅ Only✅ + self-host✅ + self-host
Open source✅ Apache 2.0✅ BSD 3
Hybrid search⚠️ Manual✅ Sparse vectors✅ Native BM25
GraphQL API
REST API
gRPC API
Built-in vectorizer✅ (module system)
Generative search✅ (RAG in one call)
Multi-tenancy✅ Namespaces✅ Collections✅ Multi-tenancy plugin
SOC 2 Type II✅ Cloud✅ Cloud
HIPAA✅ Enterprise✅ Enterprise Cloud
Payload filtering✅ Metadata✅✅ Best-in-class✅ Good
On-disk storage
GPU acceleration

When pgvector Is Enough

Before committing to a dedicated vector DB, consider pgvector:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Add vector column to existing table
ALTER TABLE documents
ADD COLUMN embedding vector(1536);

-- Create HNSW index
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Query
SELECT id, content,
       1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;

pgvector is the right choice if:

  • You're already on Postgres (Supabase, Neon, PlanetScale)
  • Vectors < 10M
  • Query volume < 100 QPS
  • You don't want another service to manage

Beyond those bounds, dedicated vector databases win on performance.


Decision Guide

Choose Pinecone if:

  • You want zero infrastructure — no Docker, no k8s, no ops
  • Your team has no MLOps resources
  • HIPAA compliance is required (Enterprise tier)
  • You're starting out and want to iterate fast

Choose Qdrant if:

  • Performance is critical — lowest latency, highest throughput
  • You need complex payload filtering (multiple conditions, nested objects)
  • You're comfortable with self-hosting or Qdrant Cloud
  • Cost at scale matters — significantly cheaper than Pinecone managed

Choose Weaviate if:

  • Hybrid search (semantic + keyword) is a core requirement
  • You want built-in vectorization (no separate embedding service)
  • Generative search (RAG in one query) simplifies your architecture
  • GraphQL API fits your existing patterns

Production Multi-Tenancy Patterns

Multi-tenancy is one of the first architectural decisions you'll face in production: do you isolate tenants with separate collections/indexes, or use a shared collection with a tenant identifier in the payload?

Pinecone namespaces are the most straightforward approach. Each namespace within an index is isolated — vectors from tenant-A are never returned when you query tenant-B, even without an explicit filter. The overhead is minimal: namespaces share the underlying index infrastructure but have strict logical separation. For B2B SaaS applications with hundreds or low thousands of tenants, Pinecone namespaces provide sufficient isolation without operational complexity. The limitation is that you cannot query across namespaces in a single request, which matters if your application has cross-tenant analytics or admin views.

Qdrant offers two distinct strategies. The first is collection-per-tenant: each customer gets their own Qdrant collection. This gives full isolation — a tenant's data can be deleted cleanly with a single delete_collection call, HNSW parameters can be tuned per tenant, and a noisy tenant cannot affect search latency for others. The downside is overhead: each collection maintains its own HNSW graph, and at thousands of tenants, the memory footprint grows substantially. The second strategy is payload-filter-per-tenant: store all vectors in one collection with a tenant_id payload field, and include a FieldCondition in every query. This is cheaper at high tenant counts but provides no isolation guarantee at the API level — a bug that omits the filter leaks data across tenants. For most B2B SaaS workloads, collection-per-tenant is the right default.

Weaviate's native Multi-Tenancy API sits between these approaches. You define a collection with multiTenancyConfig.enabled: true, then activate individual tenants as needed. Each tenant gets a separate HNSW graph within the same collection schema, enforced at the storage layer rather than the application layer. Tenants can be deactivated (data is offloaded to disk but retained) or deleted cleanly. This is the most production-ready isolation model for high-tenant-count applications: you get per-tenant storage isolation without the full overhead of separate collections, and Weaviate enforces the boundary so application-layer bugs cannot leak data.

The practical cost implication: at low tenant counts (<100), any approach works and the difference is negligible. At thousands of tenants, Weaviate's MT API or Qdrant collection-per-tenant (with careful resource management) are preferable to Pinecone namespaces, which scale linearly with storage costs. At hundreds of thousands of tenants (consumer-scale), payload-filter approaches become necessary regardless of database — the per-tenant collection overhead becomes unmanageable.


Operational Considerations for Self-Hosting

Pinecone is managed-only and requires no operational investment. Qdrant and Weaviate are open source and can be self-hosted, but their operational footprints differ meaningfully.

Self-hosting Qdrant is relatively straightforward. The minimal setup is a single Docker container:

docker run -p 6333:6333 -p 6334:6334 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  qdrant/qdrant

For production, the Qdrant Helm chart handles Kubernetes deployment, horizontal scaling, and persistent volume configuration. Memory planning: with on_disk=True (recommended for large collections), Qdrant memory-maps the HNSW graph from disk, keeping RAM usage proportional to the hot portion of the index rather than the full dataset. A 10M vector collection at 1536 dimensions uses roughly 60GB of storage but can be queried with 4–8GB of RAM if access patterns are skewed. Backup is handled via Qdrant's Snapshot API — POST /collections/{collection_name}/snapshots creates a portable snapshot that can be uploaded to object storage. The /metrics endpoint exposes Prometheus-compatible metrics; the community maintains Grafana dashboards covering QPS, latency percentiles, and memory usage.

Self-hosting Weaviate requires more planning. Weaviate is memory-hungry by design: the full HNSW graph lives in RAM, with typical RAM usage 3–4x the raw vector data size. A 10M vector collection at 1536 dimensions requires 60GB disk and roughly 20–30GB RAM for good query performance. Weaviate's Docker Compose setup is multi-container when you enable vectorizer or generative modules — the text2vec-openai and generative-openai modules run as separate microservices and require API key environment variables to be configured. For Kubernetes, the official Helm chart supports PersistentVolumeClaims for storage. Backup is built into the API: client.backup.create(backup_id="weekly-backup", backend="s3") streams data to an S3 bucket; restores are similarly API-driven. Weaviate exposes Prometheus metrics at /v1/meta, and the team maintains a reference Grafana dashboard. The operational investment is higher than Qdrant, but the payoff is the native hybrid search and module ecosystem that eliminates external service dependencies.

For teams without dedicated infrastructure engineers, Qdrant Cloud and Weaviate Cloud (both offer free tiers) eliminate these operational concerns while maintaining the feature advantages of each database. The managed options use the same APIs as self-hosted, so migrating later if you outgrow managed pricing is straightforward.


Methodology

Latency and throughput benchmarks sourced from ann-benchmarks.com and Qdrant's published benchmark blog posts as of Q1 2026, measured against a 100M-vector dataset at 1536 dimensions on 32-core hardware; self-hosted results vary significantly by CPU, NVMe storage, and RAM configuration. NVMe storage is strongly recommended for on-disk configurations — rotational disks create unacceptable seek latency for HNSW graph traversal at the scales benchmarked here. Cost estimates derived from Pinecone serverless, Weaviate Cloud, and Qdrant Cloud published pricing pages as of March 2026; self-hosted estimates assume equivalent bare-metal or VM costs and exclude engineering time. Ingress and egress costs for cloud-hosted deployments (querying from an application in a different cloud region) can add materially to total costs and are not included in the comparison table. Code examples use Pinecone Python SDK v5.x, Qdrant client 1.9.x, and Weaviate v4 Python client. pgvector benchmarks based on PostgreSQL 16 with pgvector 0.7.x using HNSW index. All feature matrix entries verified against official documentation as of March 2026; both Qdrant and Weaviate release frequently — consult changelogs for recent additions. Multi-tenancy API behavior in Weaviate is documented under the v4 client; the v3 client uses a different method signature for tenant management. Hybrid search recall quality depends heavily on the alpha parameter and the tokenization strategy for BM25 — optimal values vary by domain and query distribution and should be tuned with a held-out evaluation set before deploying to production.


Browse all vector database and AI infrastructure APIs at APIScout.

Related: RAG Pipeline: Pinecone vs Weaviate vs pgvector · Embedding Models Compared: OpenAI vs Cohere vs Voyage, Vector Database APIs Compared (2026), Supabase vs Neon vs PlanetScale 2026

The API Integration Checklist (Free PDF)

Step-by-step checklist: auth setup, rate limit handling, error codes, SDK evaluation, and pricing comparison for 50+ APIs. Used by 200+ developers.

Join 200+ developers. Unsubscribe in one click.