Skip to main content

Cloudflare Workers AI vs AWS Bedrock vs Azure 2026

·APIScout Team
Share:

TL;DR

If you're already on AWS, use Bedrock. Already on Azure, use Azure OpenAI. Building edge apps, use Cloudflare Workers AI. These three managed AI platforms differ more in philosophy than capability: Cloudflare runs inference at 300+ edge locations (sub-50ms globally), AWS Bedrock gives you the widest enterprise model catalog with built-in governance, and Azure OpenAI gives you GPT-4o with enterprise SLAs and compliance certifications. None of them is best in isolation — your existing cloud infrastructure almost always determines the right choice.

Key Takeaways

  • Cloudflare Workers AI: fastest globally distributed inference, 50+ models, $0.011-0.055/1K neurons, no cold starts
  • AWS Bedrock: 50+ models (Llama, Mistral, Claude, Titan), pay-per-token, IAM integration, best for existing AWS infrastructure
  • Azure OpenAI: GPT-4o + o1 with enterprise SLAs, private endpoints, SOC2/HIPAA compliance, best for Microsoft shops
  • Latency: Cloudflare ~30-50ms globally, AWS/Azure ~100-300ms from non-primary regions
  • Enterprise compliance: Azure OpenAI wins (HIPAA, FedRAMP), Bedrock close behind
  • Multi-model access: Bedrock wins (Claude, Llama, Mistral, Titan, Cohere, AI21 all unified)

Cloudflare Workers AI: Inference at the Edge

Best for: globally distributed apps, low-latency requirements, serverless-first architecture, Cloudflare Workers projects

Cloudflare runs AI inference on their global network — the same infrastructure that handles ~20% of web traffic. When your user is in Tokyo, the model runs in Tokyo. In Frankfurt, Frankfurt.

// workers-ai.ts — Running inference at the edge:
export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const { prompt } = await request.json<{ prompt: string }>();

    // Text generation:
    const response = await env.AI.run('@cf/meta/llama-3.3-70b-instruct', {
      messages: [
        { role: 'system', content: 'You are a helpful assistant.' },
        { role: 'user', content: prompt },
      ],
      max_tokens: 512,
      stream: false,
    });

    return Response.json(response);
  },
};
// Streaming response from Workers AI:
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const stream = await env.AI.run('@cf/meta/llama-3.3-70b-instruct', {
      messages: [{ role: 'user', content: 'Explain DNS in 3 sentences.' }],
      stream: true,
    });

    // stream is a ReadableStream of SSE chunks:
    return new Response(stream, {
      headers: {
        'Content-Type': 'text/event-stream',
        'Cache-Control': 'no-cache',
      },
    });
  },
};
# wrangler.toml:
name = "ai-worker"
main = "src/index.ts"
compatibility_date = "2026-01-01"

[ai]
binding = "AI"

Workers AI Models (2026)

Text Generation:
  @cf/meta/llama-3.3-70b-instruct     — Latest Llama, best quality
  @cf/meta/llama-3.1-8b-instruct      — Fast, cheap, good for simple tasks
  @cf/mistral/mistral-7b-instruct-v0.2 — Good general purpose
  @cf/qwen/qwen1.5-7b-chat-awq        — Efficient, multilingual

Code Generation:
  @cf/deepseek-ai/deepseek-coder-6.7b-instruct — Code-specialized
  @hf/thebloke/deepseek-coder-6.7b-instruct-awq

Text Embeddings:
  @cf/baai/bge-small-en-v1.5         — 384 dims, fast
  @cf/baai/bge-large-en-v1.5         — 1024 dims, better quality

Image Generation:
  @cf/stabilityai/stable-diffusion-xl-base-1.0
  @cf/bytedance/stable-diffusion-xl-lightning

Vision:
  @cf/llava-hf/llava-1.5-7b-hf       — Vision + text

Speech:
  @cf/openai/whisper                  — Transcription

Workers AI Pricing

Billing unit: "Neurons" (compute units)

Text generation (Llama 70B):
  Input:  ~0.027 neurons/token
  Output: ~0.027 neurons/token
  Cost:   $0.011/1K neurons
  Effective: ~$0.30-0.60/M tokens (varies by model)

Free tier: 10,000 neurons/day
Workers Paid plan ($5/month): 10M neurons/month included
Beyond: $0.011/1K neurons

Workers AI is significantly cheaper for medium loads because the $5/month Workers plan includes a generous neuron allocation.


AWS Bedrock: Enterprise Model Catalog

Best for: AWS-heavy organizations, teams needing model governance, fine-tuning requirements, RAG with S3 data

AWS Bedrock is not just one model — it's a managed gateway to 50+ foundation models from Meta, Mistral, Anthropic, AI21, Cohere, and Amazon's own Titan models, all accessible with the same IAM-authenticated API.

// AWS Bedrock with @aws-sdk/client-bedrock-runtime:
import {
  BedrockRuntimeClient,
  InvokeModelCommand,
  InvokeModelWithResponseStreamCommand,
} from '@aws-sdk/client-bedrock-runtime';

const client = new BedrockRuntimeClient({
  region: 'us-east-1',
  // Uses IAM role or env vars (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
});

// Invoke Llama 3.3 70B:
async function invokeLlama(prompt: string) {
  const payload = {
    prompt: `<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n${prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>`,
    max_gen_len: 512,
    temperature: 0.7,
    top_p: 0.9,
  };

  const command = new InvokeModelCommand({
    modelId: 'meta.llama3-3-70b-instruct-v1:0',
    body: JSON.stringify(payload),
    contentType: 'application/json',
    accept: 'application/json',
  });

  const response = await client.send(command);
  const decoded = JSON.parse(Buffer.from(response.body).toString('utf-8'));
  return decoded.generation;
}
// Bedrock Converse API — unified format across ALL models:
import { ConverseCommand } from '@aws-sdk/client-bedrock-runtime';

async function converse(modelId: string, userMessage: string) {
  const command = new ConverseCommand({
    modelId,
    messages: [
      { role: 'user', content: [{ text: userMessage }] },
    ],
    inferenceConfig: {
      maxTokens: 512,
      temperature: 0.7,
    },
  });

  const response = await client.send(command);
  return response.output?.message?.content?.[0]?.text;
}

// Same API works for ALL Bedrock models:
await converse('meta.llama3-3-70b-instruct-v1:0', 'Hello');
await converse('mistral.mistral-7b-instruct-v0:2', 'Hello');
await converse('anthropic.claude-3-5-sonnet-20241022-v2:0', 'Hello');
await converse('amazon.titan-text-premier-v1:0', 'Hello');
// Bedrock streaming with Converse:
import { ConverseStreamCommand } from '@aws-sdk/client-bedrock-runtime';

async function* converseStream(modelId: string, message: string) {
  const command = new ConverseStreamCommand({
    modelId,
    messages: [{ role: 'user', content: [{ text: message }] }],
  });

  const response = await client.send(command);

  for await (const event of response.stream!) {
    if (event.contentBlockDelta?.delta?.text) {
      yield event.contentBlockDelta.delta.text;
    }
  }
}

// Usage:
for await (const chunk of converseStream('meta.llama3-3-70b-instruct-v1:0', 'Explain TLS')) {
  process.stdout.write(chunk);
}

Bedrock Model Catalog (Key Models)

ProviderModel IDNotes
Metameta.llama3-3-70b-instruct-v1:0Best open model on Bedrock
Mistralmistral.mistral-large-2402-v1:0Strong reasoning
Anthropicanthropic.claude-3-5-sonnet-20241022-v2:0Best quality, higher cost
Amazonamazon.titan-text-premier-v1:0AWS-native, good for RAG
Coherecohere.command-r-plus-v1:0Best for long-context RAG
AI21ai21.jamba-1-5-large-v1:0Long context (256K)

Bedrock Knowledge Bases (RAG Built-In)

Bedrock's killer feature for enterprise: managed RAG with S3 data sources.

import {
  BedrockAgentRuntimeClient,
  RetrieveAndGenerateCommand,
} from '@aws-sdk/client-bedrock-agent-runtime';

const agentClient = new BedrockAgentRuntimeClient({ region: 'us-east-1' });

async function ragQuery(query: string, knowledgeBaseId: string) {
  const command = new RetrieveAndGenerateCommand({
    input: { text: query },
    retrieveAndGenerateConfiguration: {
      type: 'KNOWLEDGE_BASE',
      knowledgeBaseConfiguration: {
        knowledgeBaseId,
        modelArn: 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0',
        retrievalConfiguration: {
          vectorSearchConfiguration: { numberOfResults: 5 },
        },
      },
    },
  });

  const response = await agentClient.send(command);
  return {
    answer: response.output?.text,
    citations: response.citations,
  };
}

Bedrock Pricing

On-Demand pricing (us-east-1):

Model                        | Input $/1M  | Output $/1M
-----------------------------|------------|------------
Llama 3.3 70B Instruct       | $0.72      | $0.99
Mistral Large (2402)         | $4.00      | $12.00
Claude 3.5 Sonnet            | $3.00      | $15.00
Amazon Titan Text Premier    | $0.50      | $1.50
Cohere Command R+            | $3.00      | $15.00

Provisioned Throughput: pre-buy capacity for consistent high-volume use

Azure OpenAI: Enterprise GPT with Microsoft Compliance

Best for: organizations requiring HIPAA/FedRAMP compliance, Microsoft/Azure shops, GPT-4o access with enterprise SLAs, government and healthcare

Azure OpenAI is Microsoft-hosted OpenAI models — the exact same models (GPT-4o, o1, DALL-E 3) but with Azure's enterprise wrapper: private endpoints, customer-managed encryption keys, content filtering policies, audit logs, and compliance certifications.

// Azure OpenAI uses the same SDK as OpenAI — just different endpoint:
import OpenAI from 'openai';

const azureClient = new OpenAI({
  apiKey: process.env.AZURE_OPENAI_API_KEY,
  baseURL: `https://${process.env.AZURE_OPENAI_ENDPOINT}/openai/deployments/${process.env.AZURE_OPENAI_DEPLOYMENT_NAME}`,
  defaultQuery: { 'api-version': '2024-12-01-preview' },
  defaultHeaders: { 'api-key': process.env.AZURE_OPENAI_API_KEY },
});

// Or use the official Azure OpenAI SDK for stronger typing:
import { AzureOpenAI } from 'openai';

const client = new AzureOpenAI({
  endpoint: process.env.AZURE_OPENAI_ENDPOINT!,
  apiKey: process.env.AZURE_OPENAI_API_KEY!,
  apiVersion: '2024-12-01-preview',
  deployment: 'gpt-4o',  // Your deployment name
});
// Chat completion (identical to OpenAI SDK):
const response = await client.chat.completions.create({
  model: 'gpt-4o',     // Uses your deployment name
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Summarize this document...' },
  ],
  max_tokens: 1024,
  temperature: 0.7,
});

console.log(response.choices[0].message.content);
// Azure OpenAI with function calling (same as OpenAI):
const tools = [
  {
    type: 'function' as const,
    function: {
      name: 'search_documents',
      description: 'Search enterprise document store',
      strict: true,
      parameters: {
        type: 'object',
        properties: {
          query: { type: 'string' },
          department: { type: 'string', enum: ['legal', 'hr', 'finance', 'engineering'] },
          date_range: { type: 'string', description: 'ISO date range, e.g. 2025-01-01/2025-12-31' },
        },
        required: ['query'],
        additionalProperties: false,
      },
    },
  },
];

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Find all legal contracts from Q4 2025' }],
  tools,
  tool_choice: 'auto',
});
// Azure OpenAI Assistants API (full stateful conversations):
const assistant = await client.beta.assistants.create({
  name: 'Enterprise Support Agent',
  instructions: 'You help employees with HR and IT questions.',
  model: 'gpt-4o',
  tools: [{ type: 'file_search' }, { type: 'code_interpreter' }],
});

const thread = await client.beta.threads.create();

await client.beta.threads.messages.create(thread.id, {
  role: 'user',
  content: 'What is the vacation policy for US employees?',
});

const run = await client.beta.threads.runs.createAndPoll(thread.id, {
  assistant_id: assistant.id,
});

Azure OpenAI Models (2026)

ModelDeploymentNotes
GPT-4ogpt-4oLatest, best quality
GPT-4o minigpt-4o-miniCheaper, fast
o1o1Reasoning model
o3-minio3-miniFast reasoning
DALL-E 3dall-e-3Image generation
WhisperwhisperSpeech transcription
text-embedding-3-largetext-embedding-3-largeBest embeddings

Enterprise Features Only Azure Has

✅ Private Endpoints — Your traffic never leaves Azure backbone
✅ Customer-Managed Keys — Encrypt with your own Azure Key Vault keys
✅ Content Filtering — Customizable harm categories per deployment
✅ Managed Identity — No API keys needed, uses Azure AD
✅ Compliance: SOC2, HIPAA, FedRAMP High, ISO 27001, GDPR
✅ No data training: Your data is NOT used to train OpenAI models
✅ Regional deployment: Choose EU regions for data residency
✅ 99.9% uptime SLA (OpenAI has no SLA guarantee)

Azure Authentication Without API Keys

// Production pattern: use Managed Identity (no API keys):
import { DefaultAzureCredential, getBearerTokenProvider } from '@azure/identity';
import { AzureOpenAI } from 'openai';

const credential = new DefaultAzureCredential();
const scope = 'https://cognitiveservices.azure.com/.default';
const azureADTokenProvider = getBearerTokenProvider(credential, scope);

const client = new AzureOpenAI({
  endpoint: process.env.AZURE_OPENAI_ENDPOINT!,
  azureADTokenProvider,  // No API key needed when running in Azure
  apiVersion: '2024-12-01-preview',
  deployment: 'gpt-4o',
});

Side-by-Side Comparison

Cloudflare Workers AIAWS BedrockAzure OpenAI
Best forEdge latency, global appsAWS orgs, multi-modelMicrosoft orgs, GPT-4o
Latency~30-50ms globally~100-300ms~100-200ms (US)
Model count50+50+OpenAI models only
GPT-4o access
Llama access
Claude access
HIPAALimited
Private endpoint✅ Workers✅ VPC✅ Private Link
Fine-tuningLimited✅ (GPT-4o mini)
RAG built-in✅ Vectorize✅ Knowledge Bases✅ AI Search
Free tier10K neurons/dayNo free tier$200 credit
Pricing modelPer neuronPer tokenPer token

Decision Framework

Choose CLOUDFLARE WORKERS AI if:
  → Your app already runs on Cloudflare Workers
  → Globally distributed users, latency is critical
  → You need inference with zero cold starts
  → Simple use case: Llama, embeddings, image gen

Choose AWS BEDROCK if:
  → Your infrastructure is on AWS
  → You need access to multiple model providers (Llama, Claude, Mistral, Titan)
  → You want managed RAG with S3 data sources
  → Enterprise governance and model access control is required
  → You want Claude without going to Anthropic directly

Choose AZURE OPENAI if:
  → You need GPT-4o or o1 specifically
  → HIPAA, FedRAMP, or government compliance required
  → Your organization is Microsoft-certified / Azure-native
  → You need private endpoints and no data training guarantee
  → Existing Azure AD for authentication (Managed Identity)

Use NONE of these if:
  → You're a startup/indie dev (use OpenAI/Anthropic direct — simpler, cheaper)
  → You need bleeding-edge models first (hyperscalers lag direct providers by weeks)
  → You're optimizing for cost (direct APIs are cheaper, no cloud markup)

Code: Switching Between Providers

Since all three are largely OpenAI-compatible:

// Universal client factory:
import OpenAI from 'openai';

type CloudProvider = 'cloudflare' | 'bedrock' | 'azure' | 'openai';

function createClient(provider: CloudProvider): OpenAI {
  switch (provider) {
    case 'azure':
      return new OpenAI({
        apiKey: process.env.AZURE_OPENAI_API_KEY,
        baseURL: `${process.env.AZURE_OPENAI_ENDPOINT}/openai/deployments/${process.env.AZURE_OPENAI_DEPLOYMENT}`,
        defaultQuery: { 'api-version': '2024-12-01-preview' },
      });

    case 'openai':
      return new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

    // Bedrock has its own SDK (not OpenAI-compatible at the SDK level)
    // Use bedrock-compatible OpenAI endpoint via LiteLLM or direct SDK
    case 'bedrock':
      return new OpenAI({
        apiKey: 'bedrock',  // Placeholder — real auth via AWS credentials
        baseURL: 'https://bedrock-runtime.us-east-1.amazonaws.com/model',
      });

    default:
      throw new Error(`Unknown provider: ${provider}`);
  }
}

Enterprise Governance and Compliance Deep Dive

Enterprise AI adoption is often blocked not by technical capability but by compliance requirements — security teams need audit trails, legal needs data processing agreements, and IT needs network isolation guarantees. The three platforms differ substantially here.

Azure OpenAI holds the strongest compliance posture in 2026. It is the only managed AI service with FedRAMP High authorization, making it the only viable option for US federal government workloads and many state government use cases. HIPAA Business Associate Agreements (BAAs) are available under Enterprise Agreements — your PHI and ePHI data is contractually protected from being used in model training. Azure Private Link ensures traffic from your VNet to Azure OpenAI never traverses the public internet, which is a requirement in most healthcare and financial regulatory frameworks. Beyond the certifications, Azure OpenAI provides customer-managed encryption keys through Azure Key Vault, giving enterprises key custody — a requirement for clients who cannot allow the cloud provider to control encryption material. Microsoft Purview integration centralizes audit logging for all Azure OpenAI API calls, giving compliance teams a single pane of glass for AI usage monitoring across the organization. Content filtering policies are configurable per deployment: you can raise or lower harm thresholds for specific use cases, with audit records of every policy change.

AWS Bedrock offers strong but not equivalent compliance coverage. It carries SOC 2 Type II, HIPAA, ISO 27001, and PCI DSS certifications. Critically, Bedrock does not hold FedRAMP High authorization as of Q1 2026, which disqualifies it from some federal workloads. For commercial enterprise use, Bedrock's governance tooling is robust: model access requires explicit opt-in per foundation model via the Bedrock console, preventing teams from accidentally invoking models they haven't reviewed. IAM policies provide fine-grained control over which models specific roles can invoke, and Service Control Policies (SCPs) at the AWS organization level can enforce organization-wide model allowlists. Bedrock Guardrails sits as a content policy layer above all foundation models — you define topic deny lists, PII detection and redaction rules, and harm filters that apply regardless of which model is invoked. AWS PrivateLink endpoints keep Bedrock runtime traffic on the AWS backbone. AWS CloudTrail logs all Bedrock API calls with request metadata, though response content logging requires additional configuration.

Cloudflare Workers AI is not suitable for regulated industries. It holds no FedRAMP authorization, offers no HIPAA BAA, and Cloudflare's data processing agreements, while comprehensive for most commercial applications, do not meet the requirements of HIPAA, FedRAMP, or PCI DSS compliance frameworks. For unregulated use cases — consumer apps, internal developer tooling, performance-sensitive applications — Cloudflare's security posture (TLS encryption in transit, AES-256 at rest, no training data retention per their acceptable use policy) is entirely sufficient. But regulated workloads must route around Workers AI.


Cost Optimization Strategies

Each platform rewards different optimization approaches. Understanding the billing model is essential for keeping AI costs predictable at scale.

Workers AI cost optimization centers on the Workers Paid plan's neuron allocation. At $5/month, the plan includes 10 million neurons — enough for approximately 370K input/output tokens on Llama 70B, or roughly 1.5M tokens on Llama 8B. For teams with modest AI needs integrated into a broader Workers application, the $5/month plan often covers all inference costs without additional per-neuron charges. The primary lever is model selection: Llama 8B costs roughly a quarter of the neurons per token compared to Llama 70B, and for classification, routing, and simple summarization tasks, the quality difference is minimal. A practical pattern is to run a lightweight 8B model as an intent classifier to determine task complexity, then route only tasks that require it to the 70B model. This routing adds a few milliseconds of latency but can reduce neuron consumption by 60–70% for mixed workloads.

Bedrock cost optimization has two major levers. First, Provisioned Throughput (PT): pre-purchasing guaranteed compute capacity at a model level, available in 1-month and 6-month commitments. At sustained loads above approximately 100K tokens per minute for a single model, Provisioned Throughput typically reduces costs by 20–40% versus on-demand pricing. The 6-month commitment unlocks larger discounts; the trade-off is inflexibility if your traffic patterns shift. Second, model selection matters significantly: Amazon Titan Text Premier is priced at $0.50/1M input tokens versus Llama 3.3 70B at $0.72/1M — for internal RAG pipelines where model quality is comparable, Titan can meaningfully reduce costs. Bedrock Batch inference offers a 50% discount over on-demand for offline processing — use it for nightly embedding refresh, bulk classification, or batch document analysis. The Bedrock Converse API's unified model interface makes swapping models a one-line config change, which makes cost experimentation across models practical.

Azure OpenAI cost optimization revolves around Provisioned Throughput Units (PTUs) and model tier selection. PTUs are purchased as monthly capacity commitments and provide guaranteed throughput with lower per-token effective cost than Standard pay-as-you-go at high volumes. GPT-4o mini versus GPT-4o is the highest-impact decision: GPT-4o mini handles extraction, classification, structured output, and summarization at approximately 30x lower cost than GPT-4o. Reserving GPT-4o for complex multi-step reasoning, tool use, and tasks that require superior instruction following — while routing everything else to GPT-4o mini — is the single most effective cost optimization in an Azure OpenAI deployment. Global Standard deployment (instead of regional Standard) routes traffic across Azure regions to maximize throughput and availability at the same price; use it as the default unless you have data residency constraints that require a specific region.


When Not to Use Managed AI Platforms

These three platforms are not the right choice for every use case. If you're a startup or indie developer, direct APIs from OpenAI, Anthropic, or Mistral are simpler to integrate, cheaper at low volumes (no enterprise markup), and give you access to the newest models immediately — managed platforms often lag direct providers by weeks or months when new model versions launch. The GPT-4o-2024-11-20 update, for example, reached Azure OpenAI several weeks after the direct OpenAI API.

If cost efficiency at high volume is your primary concern, direct inference APIs or self-hosted models outperform all three platforms. Running Llama 3.3 70B on a fleet of H100 instances via vLLM costs $0.20–0.40/1M tokens at scale, versus $0.72–0.99/1M on Bedrock. The operational investment in self-hosted inference only makes sense above ~$5K/month in model costs, but the crossover exists and is worth benchmarking for high-volume workloads.

If you need the absolute latest models as soon as they release, the direct API path is the only option. Managed platforms trade model recency for enterprise compliance and infrastructure abstraction — a trade-off that is worth it for regulated enterprises but unnecessary overhead for teams without those constraints.


Methodology

Latency benchmarks for Cloudflare Workers AI sourced from Cloudflare's published metrics and Workers AI documentation as of March 2026; actual TTFT varies by model size, region proximity, and current load on the edge node. AWS and Azure latency figures from provider documentation and independent community benchmarks; both vary significantly by region — latency from ap-southeast-1 to us-east-1 Bedrock endpoints adds 150–200ms round-trip that edge-based alternatives avoid entirely. Compliance certifications (FedRAMP, HIPAA, SOC 2, PCI DSS) verified against Azure Trust Center, AWS Artifact, and Cloudflare's trust documentation as of March 2026; certification status changes — always verify directly with your cloud vendor before procurement decisions. Pricing sourced from Workers AI, Bedrock (us-east-1), and Azure OpenAI published pricing pages as of March 2026; Provisioned Throughput pricing varies by model, commitment term, and region. Model catalog entries verified from each provider's documentation as of March 2026.


Discover and compare managed AI APIs at APIScout.

Related: Cloudflare Workers vs Vercel Edge vs Lambda@Edge, DeepSeek vs OpenAI vs Claude: Budget AI 2026, Function Calling in AI APIs

The API Integration Checklist (Free PDF)

Step-by-step checklist: auth setup, rate limit handling, error codes, SDK evaluation, and pricing comparison for 50+ APIs. Used by 200+ developers.

Join 200+ developers. Unsubscribe in one click.