Skip to main content

OpenAI vs Voyage vs Cohere: Embedding Models 2026

·APIScout Team
Share:

TL;DR

For most RAG and semantic search tasks: text-embedding-3-small from OpenAI wins on value — excellent performance, 1536 dimensions, $0.02/1M tokens. For maximum quality: Voyage AI's voyage-3-large leads the MTEB leaderboard. For zero-cost self-hosted: nomic-embed-text-v1.5 (via Ollama) is surprisingly competitive. Cohere's embed-v3 excels at multilingual and classification. The performance gap between these models is smaller than the gap between chunking strategies — your RAG pipeline's chunking and retrieval logic matters more than model choice.

Key Takeaways

  • OpenAI text-embedding-3-small: best value, 62.3 MTEB, $0.02/1M tokens, 1536 dims
  • OpenAI text-embedding-3-large: higher quality, 64.6 MTEB, $0.13/1M tokens, 3072 dims
  • Voyage AI voyage-3-large: top MTEB score 68.2, best raw quality, $0.12/1M tokens
  • Cohere embed-v3: multilingual (100+ languages), classification-optimized, $0.10/1M tokens
  • nomic-embed-text-v1.5: free self-hosted, 62.4 MTEB (beats OpenAI small!), 768 dims
  • Matryoshka embeddings: OpenAI and Voyage support truncating dimensions — save 4x cost without much quality loss

Benchmarks Are a Starting Point, Not the Answer

MTEB scores are the closest thing to an objective comparison between embedding models, but they have important limitations. The benchmark covers 56 tasks across retrieval, classification, clustering, and semantic similarity — but your specific use case may not be representative of the benchmark distribution. A model that scores well on general web text retrieval may underperform on highly technical domain-specific content (legal, medical, scientific), where specialized models or fine-tuned embeddings would have a structural advantage.

Test on your own data: Before committing to an embedding model for a production RAG system, evaluate it on 50-100 sample queries from your actual use case. Create a small evaluation set: (query, expected_document_id) pairs. Run each candidate embedding model, build a temporary index, query it with your sample queries, and measure recall@k (how often the expected document appears in the top k results). A 2-point MTEB difference is interesting; a 15% recall difference on your own data is decisive.

Domain-specific models: For specialized domains, check the MTEB leaderboard for domain-specific categories. Voyage AI's voyage-code-3 is optimized for code retrieval and outperforms general models significantly on code search tasks. Cohere's multilingual model excels at cross-lingual retrieval. For medical text, fine-tuned BioBERT variants in the HuggingFace model library may outperform all the commercial options on MTEB's BEIR biomedical benchmarks. Build your evaluation set from representative examples before selecting a model — five hours of evaluation work can save months of regret.

MTEB Benchmark Scores (2026)

MTEB (Massive Text Embedding Benchmark) is the standard leaderboard for embedding models across 56 tasks:

ModelMTEB ScoreDimensionsCost/1M tokensContext
voyage-3-large68.21024$0.1232K
text-embedding-3-large64.63072$0.138K
nomic-embed-text-v1.562.4768Free (self-hosted)8K
text-embedding-3-small62.31536$0.028K
Cohere embed-english-v3.064.51024$0.10512
voyage-365.11024$0.0632K
Cohere embed-multilingual-v360.11024$0.10512
text-embedding-ada-00261.01536$0.108K

Don't over-optimize on MTEB. A 2-point MTEB difference rarely matters in practice as much as retrieval strategy, chunk size, or query preprocessing.


OpenAI Embeddings: Default Choice

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Single embedding:
async function embed(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text.replace(/\n/g, ' '),  // Newlines hurt quality
  });
  return response.data[0].embedding;
}

// Batch embeddings (much more efficient):
async function embedBatch(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts.map((t) => t.replace(/\n/g, ' ')),
  });
  // Response preserves order:
  return response.data.map((d) => d.embedding);
}

Matryoshka: Truncate Dimensions for Cost

OpenAI's v3 models support Matryoshka Representation Learning — you can truncate the output dimensions with minimal quality loss:

// Full dimensions: 1536 (text-embedding-3-small)
// Truncated: 256 dims — 6x smaller, ~1.5% quality loss

const response = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: text,
  dimensions: 256,  // Truncate to 256 dims
});

// Storage savings: 1536 float32 = 6KB per vector
//                  256 float32 = 1KB per vector
// At 1M documents: 6GB vs 1GB — huge difference for pgvector

// Useful dimensions for text-embedding-3-small:
// 1536 — full quality (default)
// 512  — good quality, 3x smaller
// 256  — acceptable quality, 6x smaller

Voyage AI: Highest Quality

// npm install voyageai
import VoyageAI from 'voyageai';

const voyage = new VoyageAI({ apiKey: process.env.VOYAGE_API_KEY });

// Basic embedding:
const response = await voyage.embed({
  input: ['How do I connect to a database?'],
  model: 'voyage-3-large',
});

const embedding = response.data[0].embedding;  // 1024 dims

// Batch:
const batchResponse = await voyage.embed({
  input: texts,
  model: 'voyage-3-large',
  input_type: 'document',  // 'document' for corpus, 'query' for search queries
  truncation: true,         // Truncate at context limit instead of error
});

Voyage Asymmetric Embeddings

Voyage supports asymmetric search — different representations for documents vs queries:

// When indexing documents:
const docEmbeddings = await voyage.embed({
  input: documentTexts,
  model: 'voyage-3',
  input_type: 'document',  // Optimized for documents
});

// When searching:
const queryEmbedding = await voyage.embed({
  input: [userQuery],
  model: 'voyage-3',
  input_type: 'query',    // Optimized for queries
});

// This matters more than it seems:
// A query "how to connect" and a document "database connection tutorial"
// mean the same thing but look different textually
// Asymmetric models bridge this gap better

Voyage models:

  • voyage-3-large: highest quality, $0.12/1M
  • voyage-3: balanced, $0.06/1M
  • voyage-3-lite: fast and cheap, $0.02/1M — competitive with OpenAI small
  • voyage-code-3: optimized for code search, $0.18/1M

Cohere Embeddings: Multilingual + Classification

// npm install cohere-ai
import { CohereClient } from 'cohere-ai';

const cohere = new CohereClient({ token: process.env.COHERE_API_KEY });

// English embeddings:
const response = await cohere.v2.embed({
  texts: [
    'What is machine learning?',
    'Explain neural networks',
  ],
  model: 'embed-english-v3.0',
  inputType: 'search_document',  // For indexing; 'search_query' for querying
  embeddingTypes: ['float'],
});

const embeddings = response.embeddings.float!;
// Multilingual — 100+ languages:
const multilingualResponse = await cohere.v2.embed({
  texts: [
    'How are you?',         // English
    '¿Cómo estás?',        // Spanish
    'Comment allez-vous?',  // French
    '你好吗?',               // Chinese
  ],
  model: 'embed-multilingual-v3.0',
  inputType: 'search_document',
  embeddingTypes: ['float'],
});

// All 4 texts will have similar embeddings for similar meanings
// Great for global apps where users query in different languages
// Cohere int8 embeddings — 4x smaller, minimal quality loss:
const int8Response = await cohere.v2.embed({
  texts: documents,
  model: 'embed-english-v3.0',
  inputType: 'search_document',
  embeddingTypes: ['int8'],  // 1024 int8 vs 1024 float32 = 4x smaller
});

// Great for: very large corpora where storage is a concern

Open Source: nomic-embed via Ollama (Free)

# Install Ollama, then pull the model:
ollama pull nomic-embed-text

# Serve (already running with `ollama serve`):
# http://localhost:11434
// nomic-embed-text via Ollama API:
async function embedWithOllama(text: string): Promise<number[]> {
  const response = await fetch('http://localhost:11434/api/embeddings', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      model: 'nomic-embed-text',
      prompt: text,
    }),
  });
  const data = await response.json();
  return data.embedding;  // 768 dimensions
}

// Or use Ollama's OpenAI-compatible endpoint:
const openaiCompatible = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama',  // Ignored but required
});

const response = await openaiCompatible.embeddings.create({
  model: 'nomic-embed-text',
  input: text,
});

nomic-embed-text-v1.5 supports Matryoshka dimensions (768, 512, 256, 128, 64) — same feature as OpenAI.

Use open source embeddings when:

  • Running locally (dev, privacy-sensitive data)
  • Batch processing large corpora ($0 vs $200+ for 10B tokens)
  • Self-hosted infrastructure where cloud API calls are not possible
  • Experimenting without API costs

Deployment considerations for self-hosted: Running nomic-embed-text on CPU (via Ollama on a developer machine) produces 768-dimensional embeddings at roughly 50-200 texts/second depending on text length and hardware. For production batch indexing of large corpora, a GPU is strongly recommended — a single A10G GPU processes nomic-embed-text at 5,000-10,000 texts/second, making a 10M document corpus index feasible in hours rather than days. For query-time embedding (real-time search), CPU performance is usually sufficient since individual queries embed in under 10ms.


Cost at Scale

Embedding 10M documents (avg 200 tokens each = 2B tokens):

ModelCostNotes
text-embedding-3-small$40Best value commercial
voyage-3-lite$40Competitive alternative
text-embedding-ada-002$200Old model, avoid
text-embedding-3-large$260Only if quality critical
nomic-embed-text$0Self-hosted

For ongoing use (1M queries/day):

ModelMonthly CostNotes
text-embedding-3-small~$1.20Very cheap
voyage-3~$3.603x more
text-embedding-3-large~$7.806x more

Query embedding cost is almost always negligible — focus on ingestion cost. A typical SaaS product with 100,000 users who each perform 5 searches per day generates 500,000 query embeddings per day, or roughly 15M per month. At text-embedding-3-small pricing, that's $0.30/month — completely negligible. The cost that compounds is initial corpus ingestion and re-ingestion when you switch models or re-chunk your documents.

Re-embedding cost planning: If you index 1M documents at 300 tokens average with text-embedding-3-small ($0.02/1M tokens), initial ingestion costs $6. If you later decide to switch to Voyage AI voyage-3 ($0.06/1M tokens), re-indexing costs $18. The switching cost is low enough that experimenting with model changes is feasible without major cost concern — evaluate first, then commit to the model that performs best on your retrieval benchmark before large-scale ingestion.


Practical Recommendation

Use text-embedding-3-small if:
  → You're starting a new project
  → Budget matters
  → English-only content
  → Good RAG quality is sufficient (vs best quality)

Use voyage-3 if:
  → You need maximum retrieval quality
  → Code search (voyage-code-3)
  → Long documents (32K context vs 8K for OpenAI)

Use cohere embed-multilingual if:
  → Users query in multiple languages
  → Classification alongside search
  → Building a multilingual search engine

Use nomic-embed-text if:
  → Self-hosting is a requirement
  → Processing huge corpora locally
  → Privacy-sensitive documents
  → Development/testing ($0 cost)

Skip text-embedding-ada-002:
  → Legacy model — always use v3 models instead
  → Strictly worse than text-embedding-3-small and costs 5x more

Find and compare embedding APIs at APIScout.

Choosing the Right Embedding Dimension

Embedding dimensions represent a fundamental tradeoff between quality, storage cost, and inference speed. Higher dimensions capture more semantic nuance but cost more to store in a vector database and take slightly longer to compare during search. Lower dimensions are faster and cheaper but may miss subtle distinctions between similar passages.

The practical decision is simpler than it sounds. For most English-language RAG applications over knowledge bases, technical documentation, or customer support content, 512-1024 dimensions is the sweet spot: enough resolution to distinguish semantically similar but meaningfully different passages, without paying the storage premium of 3072-dimensional vectors. OpenAI's text-embedding-3-small at 1536 dimensions can be truncated to 512 via the dimensions parameter with less than 2% quality loss on most benchmarks — this makes it the most flexible option in practice.

The cases where higher dimensions matter: code search (code has precise semantic structure where 256 dimensions loses important distinctions between similar functions), multilingual cross-lingual retrieval (where the embedding must bridge between languages), and dense retrieval over highly technical domains (medicine, law, chemistry) where domain-specific vocabulary makes semantic distinctions more subtle.

Matryoshka Representation Learning (MRL), which OpenAI and Voyage both implement, means the first N dimensions of a high-dimensional embedding contain the most important semantic signal. This is why truncating from 1536 to 512 dimensions works well — you're not losing random information, you're dropping the dimensions that capture the finest-grained distinctions. The quality degradation is roughly proportional to the percentage of dimensions retained, not the absolute count.

Building a Provider-Agnostic Embedding Layer

Embedding models change rapidly — today's MTEB leader will likely be displaced within 12 months. Building a thin abstraction over your embedding provider protects your vector index from provider lock-in and makes A/B testing different models practical.

The core interface is simple: embed(texts: string[]): Promise<number[][]>. Implement it once for each provider you want to support, wrap your chosen provider in the implementation, and your RAG pipeline depends on the interface rather than a specific SDK. When you want to test Voyage AI against your existing OpenAI embeddings, you swap the implementation without touching any retrieval code.

The complication is re-embedding: if you switch embedding models, you must re-embed your entire corpus with the new model before you can use it, because OpenAI embeddings and Voyage embeddings are in different vector spaces and can't be searched together. Plan for this by keeping a record of which model and dimension count was used for each batch of embeddings in your vector database metadata. When you switch models, re-embed incrementally (most recent content first) or run both models in parallel during the transition period.

The Chunking Factor: Why Model Choice Often Matters Less

The performance gap between embedding models on the MTEB benchmark (2-5 points) is real but often dwarfed by the impact of chunking strategy. In practice, a well-chunked corpus with a 62-point MTEB model routinely outperforms a poorly chunked corpus with a 68-point model.

The chunking variable most people get wrong: chunk size determines what semantic unit the embedding represents. A 500-token chunk encodes a multi-paragraph idea; a 100-token chunk encodes a single paragraph; a 2000-token chunk encodes a section. The right chunk size depends on your query pattern. If users ask specific factual questions ("what is the maximum file size?"), smaller chunks (100-200 tokens) with high precision work better. If users ask conceptual questions ("how does the authentication system work?"), larger chunks (500-800 tokens) that capture broader context retrieve better.

Context window matters for Voyage: Voyage AI's 32K token context window (vs OpenAI's 8K) is genuinely useful for documents that need to be embedded as a unit — legal contracts, academic papers, long technical specifications — where splitting the document would lose the cross-referential meaning. For typical web content or knowledge base articles, 8K is sufficient.

Methodology

MTEB (Massive Text Embedding Benchmark) scores are from the mteb/leaderboard HuggingFace repository; the leaderboard updates frequently as new models are submitted, and scores can shift by 0.5-1 point between evaluations. Pricing is sourced from each provider's public pricing pages as of early 2026. The Matryoshka dimension truncation quality loss estimate (less than 2% on typical tasks for 3x dimension reduction) is derived from OpenAI's MRL blog post and Voyage's technical documentation; actual quality loss is task-dependent and should be validated on your specific retrieval benchmark. Self-hosted nomic-embed-text performance depends heavily on GPU availability — on CPU, throughput drops significantly compared to cloud API providers. The nomic-embed-text-v1.5 version adds task_type support similar to Voyage's input_type parameter, which improves retrieval quality over the base v1 model.


Find and compare embedding APIs at APIScout.

Compare OpenAI and Cohere on APIScout.

Related: Cohere vs OpenAI: Enterprise NLP API Comparison, Anthropic MCP vs OpenAI Plugins vs Gemini Extensions, Cloudflare Workers AI vs AWS Bedrock vs Azure OpenAI

The API Integration Checklist (Free PDF)

Step-by-step checklist: auth setup, rate limit handling, error codes, SDK evaluation, and pricing comparison for 50+ APIs. Used by 200+ developers.

Join 200+ developers. Unsubscribe in one click.