Image Generation APIs: DALL-E vs FLUX vs SD 2026

TL;DR

For simple programmatic image generation: DALL-E 3 (OpenAI) — best prompt adherence, easiest API, no fine-tuning needed. For fine-tuned custom models and lower cost at volume: fal.ai or Replicate running Stable Diffusion XL or FLUX. For highest quality creative work: Midjourney (no real API, only Discord bot). In 2026, FLUX from Black Forest Labs has largely displaced Stable Diffusion as the best open model — and fal.ai runs FLUX faster and cheaper than Replicate. Choose based on whether you need quality, control, or cost.

Key Takeaways

DALL-E 3: $0.04-$0.12/image, best prompt adherence, no fine-tuning, OpenAI ecosystem
DALL-E 2: $0.016-$0.02/image, cheaper, supports inpainting/editing
fal.ai: $0.003-$0.06/image running FLUX/SDXL, GPU-native, fast (~3-5s)
Replicate: similar to fal.ai but simpler pricing, 1,000s of community models
Midjourney: no API (Discord only), best quality for artistic use
FLUX.1: best open image model in 2026, replaces SD for most use cases

DALL-E 3: Best Prompt Adherence

Best for: apps that generate images from user text, product mockups, social media content

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Generate a single image:
const response = await openai.images.generate({
  model: 'dall-e-3',
  prompt: 'A cozy coffee shop interior with warm lighting, wooden furniture, and plants',
  size: '1024x1024',     // '1024x1024', '1024x1792', '1792x1024'
  quality: 'standard',   // 'standard' or 'hd' (2x cost, more detail)
  style: 'vivid',        // 'vivid' (dramatic) or 'natural' (realistic)
  n: 1,                  // DALL-E 3 only supports n=1
});

const imageUrl = response.data[0].url;
// URL is temporary (~1 hour) — download immediately
console.log(imageUrl);

// Download and save the image:
const imageResponse = await fetch(imageUrl);
const imageBuffer = Buffer.from(await imageResponse.arrayBuffer());
fs.writeFileSync('output.png', imageBuffer);

// Or return as base64:
const base64Response = await openai.images.generate({
  model: 'dall-e-3',
  prompt: 'A modern tech startup office',
  size: '1024x1024',
  response_format: 'b64_json',  // Return base64 instead of URL
});

const base64 = base64Response.data[0].b64_json!;
// Save: Buffer.from(base64, 'base64')
// Or use in HTML: `data:image/png;base64,${base64}`

// DALL-E 3's killer feature: it automatically improves prompts
// The revised_prompt field shows what DALL-E actually used:
const response = await openai.images.generate({
  model: 'dall-e-3',
  prompt: 'coffee shop',
  size: '1024x1024',
});

console.log('Original prompt:', 'coffee shop');
console.log('Revised prompt:', response.data[0].revised_prompt);
// "A warm and inviting coffee shop with exposed brick walls, soft ambient lighting,
//  wooden tables, cozy armchairs, and steam rising from coffee cups..."

DALL-E 3 Pricing

Size          | Standard | HD
1024×1024     | $0.040   | $0.080
1024×1792     | $0.080   | $0.120
1792×1024     | $0.080   | $0.120

DALL-E 2: Cheaper + Editing

DALL-E 2 is older and lower quality than DALL-E 3, but supports editing (inpainting) — replacing parts of an image with generated content.

// DALL-E 2 editing — change part of an image:
import fs from 'fs';

const editedImage = await openai.images.edit({
  model: 'dall-e-2',
  image: fs.createReadStream('original.png'),  // Source image (RGBA PNG)
  mask: fs.createReadStream('mask.png'),        // White pixels = areas to replace
  prompt: 'A golden retriever sitting in the chair',
  size: '1024x1024',
  n: 1,
});

// mask.png: white areas will be regenerated, black areas kept

// DALL-E 2 variations — generate similar images:
const variations = await openai.images.createVariation({
  model: 'dall-e-2',
  image: fs.createReadStream('source.png'),
  n: 4,              // Generate 4 variations at once
  size: '512x512',   // 256x256, 512x512, or 1024x1024
});

// Returns 4 different versions of the source image

fal.ai: FLUX at Scale

fal.ai is a GPU inference platform — they run popular image models (FLUX, SDXL) faster and cheaper than Replicate.

// npm install @fal-ai/client
import { fal } from '@fal-ai/client';

fal.config({ credentials: process.env.FAL_KEY });

// FLUX.1 Schnell (fastest, good quality):
const result = await fal.subscribe('fal-ai/flux/schnell', {
  input: {
    prompt: 'A photorealistic image of a futuristic city at night, neon lights, rain',
    image_size: 'landscape_16_9',   // 'square', 'portrait_4_3', 'landscape_16_9', etc.
    num_images: 1,
    num_inference_steps: 4,          // Schnell uses only 4 steps (very fast)
    enable_safety_checker: true,
  },
  logs: true,
  onQueueUpdate: (update) => {
    if (update.status === 'IN_PROGRESS') {
      console.log('Progress:', update.logs?.map((l) => l.message).join(', '));
    }
  },
});

console.log('Image URL:', result.data.images[0].url);

// FLUX.1 Dev (higher quality, slower):
const devResult = await fal.subscribe('fal-ai/flux/dev', {
  input: {
    prompt: 'Professional portrait photo, studio lighting, bokeh background',
    image_size: { width: 1024, height: 1024 },
    num_inference_steps: 28,  // More steps = better quality
    guidance_scale: 3.5,
    num_images: 1,
    seed: 42,  // Reproducible output
  },
});

// Fine-tuned models on fal.ai (LoRA):
const finetunedResult = await fal.subscribe('fal-ai/flux-lora', {
  input: {
    prompt: 'photo of a PRODUCT in a minimalist setting, white background',
    loras: [
      {
        path: 'your-trained-lora-url',  // Upload your LoRA weights
        scale: 1.0,
      },
    ],
    num_images: 1,
  },
});

fal.ai Pricing

FLUX.1 Schnell:  ~$0.003/image (4 steps, fast)
FLUX.1 Dev:      ~$0.025/image (28 steps, high quality)
SDXL:            ~$0.005/image
Stable Diffusion 3: ~$0.035/image

Compare to DALL-E 3: $0.040/image
fal.ai FLUX Dev is ~40% cheaper than DALL-E 3 with comparable quality

Replicate: 1,000s of Models

Replicate runs almost any open-source image model, including fine-tuned community models.

// npm install replicate
import Replicate from 'replicate';

const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN });

// Run a model:
const output = await replicate.run(
  'black-forest-labs/flux-schnell',
  {
    input: {
      prompt: 'a cat wearing a top hat, digital art, highly detailed',
      aspect_ratio: '1:1',
      output_format: 'webp',
      output_quality: 90,
      num_outputs: 1,
    },
  }
);

// output is an array of URLs for the generated images
const imageUrl = (output as string[])[0];

// Stream progress updates:
for await (const event of replicate.stream('black-forest-labs/flux-dev', {
  input: { prompt: 'photorealistic landscape', num_inference_steps: 28 },
})) {
  console.log(event);
}

Replicate Pricing

Billing: per-second of GPU compute
FLUX Schnell:     ~$0.003-0.005/image
FLUX Dev:         ~$0.020-0.030/image
SDXL:             ~$0.005-0.010/image

Replicate also offers deployments (always-on) for consistent latency:
Serverless: cold starts (~5-10s first call), then fast
Deployment: always warm, consistent ~3-5s

Stability AI: Direct API

Stability AI (Stable Diffusion creators) has their own API:

// Stability AI REST API:
const response = await fetch('https://api.stability.ai/v2beta/stable-image/generate/core', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${process.env.STABILITY_API_KEY}`,
    Accept: 'image/*',
  },
  body: (() => {
    const form = new FormData();
    form.append('prompt', 'A stunning mountain landscape at golden hour');
    form.append('negative_prompt', 'blurry, low quality, watermark');
    form.append('aspect_ratio', '16:9');
    form.append('output_format', 'png');
    return form;
  })(),
});

const imageBuffer = Buffer.from(await response.arrayBuffer());
fs.writeFileSync('output.png', imageBuffer);

Model Comparison for Common Use Cases

Use case: Product photography mockup
→ Best: DALL-E 3 (prompt adherence) or fine-tuned FLUX on fal.ai
→ Code: openai.images.generate with detailed product description

Use case: Avatar or portrait generation
→ Best: FLUX Dev (fal.ai) or SDXL with LoRA fine-tune
→ Code: fal.subscribe('fal-ai/flux/dev', {...})

Use case: Social media content at scale
→ Best: FLUX Schnell on fal.ai ($0.003/image, 3-5s generation)
→ Code: batch requests to fal.ai for high throughput

Use case: Logo or icon generation
→ Best: DALL-E 3 (better at following specific brand guidelines)
→ Code: openai.images.generate with style: 'natural'

Use case: Edit existing images (inpainting)
→ Best: DALL-E 2 (only major provider with inpainting API)
→ Code: openai.images.edit with mask

Use case: Consistent character across many images
→ Best: Custom LoRA on fal.ai or Replicate
→ Train: train a LoRA on 20-30 images of your character
→ Code: fal.subscribe('fal-ai/flux-lora', { loras: [...] })

Side-by-Side Comparison

	DALL-E 3	FLUX via fal.ai	Replicate	Midjourney
Cost/image	$0.04-0.12	$0.003-0.025	$0.005-0.03	~$0.01 (plan/month)
Latency	8-15s	3-8s	5-15s	30-60s
Fine-tuning	❌	✅ LoRA	✅ LoRA	❌
Inpainting	DALL-E 2 only	✅	✅	❌
Prompt adherence	✅ Best	Good	Good	Great
API	✅ REST	✅ REST	✅ REST	❌ Discord only
NSFW option	❌	Optional	Optional	❌
SLA / enterprise	✅	Limited	Limited	❌

Prompt Engineering for Better Results

Image generation quality depends heavily on prompt construction. Unlike text generation where natural language works well, image models respond to specific vocabulary patterns that guide composition, style, and quality.

DALL-E 3 prompt engineering: DALL-E 3 automatically rewrites your prompt to add detail (the revised_prompt field shows what it actually used). This means you can provide shorter, more direct prompts and let DALL-E expand them. However, if you need precise control — specific compositions, exact text placement, brand-consistent styles — provide explicit detail. What DALL-E 3 handles well: relative positioning ("a coffee cup on the left side"), mood and lighting ("warm golden hour lighting"), and style references ("in the style of a watercolor illustration"). What it handles poorly: exact text rendering (text in images is still imperfect across all models), counting specific objects, and very precise spatial layouts.

FLUX prompt engineering: FLUX.1 Dev and Pro models follow prompts more literally than older Stable Diffusion models — they don't need "masterpiece, best quality, 8k" boilerplate that SDXL required. Describe what you want directly. Quality modifiers that genuinely help FLUX: technical photography terms ("f/1.8 bokeh," "studio lighting," "DSLR photo"), specific styles ("product photography," "editorial fashion," "documentary style"), and compositional guidance ("close-up portrait," "wide-angle architectural shot"). Negative prompts in FLUX work differently from SDXL — FLUX pays less attention to negative prompts overall; describing what you want positively is more effective than listing what to avoid.

Aspect ratios and composition: different image sizes require different prompt approaches. For portrait format (4:3 or 9:16), include compositional cues that work vertically — "centered portrait," "vertical composition." For landscape format (16:9), horizontal cues work better — "wide panoramic view," "rule of thirds composition." Most models are trained primarily on square and landscape images; portrait compositions sometimes require explicit guidance.

Style consistency across multiple images: for product catalogs or character art requiring consistency, include specific style anchors. For photography: camera model ("Canon 5D Mark IV"), lens type ("85mm portrait lens"), and lighting setup ("three-point studio lighting") create consistent visual language. For illustration: style references ("flat design," "isometric illustration," "watercolor on white paper") and color palette descriptions ("warm earth tones," "pastel color palette"). For brand consistency, train a LoRA on your existing assets — 20-30 reference images is enough to capture a visual style.

Content Moderation and Safety

All major image generation APIs enforce content policies that prevent generating certain categories of content. Understanding these policies before building is essential — a policy violation that disables your API access mid-launch is a serious production incident.

DALL-E 3 / OpenAI: the strictest content policy of the major providers. DALL-E 3 refuses requests involving violence, adult content, real people (unless clearly satirical), and brand logos. The automatic prompt revision also filters many edge cases proactively. If your application generates images from user-provided prompts, you'll encounter more refusals with DALL-E 3 than with open models. OpenAI's moderation endpoint (openai.moderations.create()) can pre-screen prompts before sending to DALL-E, reducing API call waste when users submit flagged content.

fal.ai / Replicate (FLUX, SDXL): open models offer more content flexibility — NSFW content generation is possible with appropriate model configurations and is explicitly supported for verified adult platforms. For SFW applications, both platforms run safety classifiers by default (enable_safety_checker: true in fal.ai). The safety checker adds ~1 second of latency; for high-throughput SFW applications, verify that the default classifier meets your needs before disabling it.

Your own moderation layer: regardless of platform, add a content moderation step before sending user prompts to any image API. Running prompts through OpenAI's moderation API, Google Cloud Natural Language, or a keyword-based classifier catches most problematic content before it reaches the image model. The cost (~$0.002/request for OpenAI moderation) is trivial compared to the risk of your API access being suspended for policy violations.

Scaling Image Generation in Production

Image generation is CPU/GPU intensive and slow (3-30 seconds per image). Handling this synchronously in web requests creates poor UX and timeouts. Production architectures use async queues.

Queue-based architecture: when a user requests an image, add a job to a queue and return a job ID immediately. A background worker processes the queue, calls the image API, stores the result, and notifies the client via webhook or WebSocket.

// 1. User requests image generation:
const jobId = await db.imageJobs.create({
  data: { prompt, userId, status: 'queued', createdAt: new Date() },
});

// Return immediately — don't wait for generation
return { jobId };

// 2. Worker processes the queue:
const job = await dequeueNextJob();
const result = await fal.subscribe('fal-ai/flux/dev', {
  input: { prompt: job.prompt, num_images: 1 },
});

// Store result in cloud storage:
const imageUrl = await uploadToS3(result.data.images[0].url);

await db.imageJobs.update({
  where: { id: job.id },
  data: { status: 'complete', imageUrl, completedAt: new Date() },
});

// Notify client via WebSocket or push notification

Caching generated images reduces cost significantly for common prompts. Store results by a normalized hash of the prompt + parameters. For consumer applications where many users might request similar images (e.g., "product photo, white background"), cache hit rates of 20-40% are realistic and can halve your monthly API costs.

Cost estimation for production: at 1,000 images/day using FLUX Dev on fal.ai ($0.025/image), monthly cost is $750. At 10,000 images/day, $7,500/month — at this scale, evaluate whether a dedicated GPU (H100 via Lambda Labs or RunPod, ~$2-4/hour) running a self-hosted FLUX model would be more cost-effective. The crossover point is typically around 5,000-8,000 images/day for FLUX-quality models.

Methodology

Storing generated images: image URLs returned by DALL-E 3, fal.ai, and Replicate are temporary — they expire within minutes to an hour. Always download and store generated images to your own object storage (S3, Cloudflare R2, Google Cloud Storage) immediately after generation. Build a pipeline: generate → download → upload to your bucket → store the permanent URL in your database. Never store the provider's temporary URL in your database as a persistent reference. This is the single most common production oversight with image generation APIs — developers build features against the temporary URL, everything works in testing, and then links break in production after expiry.

Pricing figures sourced from DALL-E 3 OpenAI pricing page, fal.ai pricing page, Replicate billing page, and Stability AI pricing page as of March 2026; GPU compute pricing for Replicate is billed by the second and varies by model and hardware tier — estimates shown are representative ranges for common models. Latency figures (DALL-E 8-15s, FLUX Schnell 3-5s, FLUX Dev 8-15s) are typical ranges under normal load; actual latency varies by server region, model, and inference step count. FLUX.1 is developed by Black Forest Labs; the FLUX Schnell (4-step) and FLUX Dev (28-step) variants are available on fal.ai and Replicate as of March 2026. Content policy details sourced from OpenAI Usage Policies and fal.ai Terms of Service as of March 2026; policies change — verify current rules before building applications that may generate edge-case content. Code examples use @fal-ai/client v0.x and replicate npm v0.x.

Compare all image generation APIs at APIScout.

Evaluate Replicate and compare alternatives on APIScout.

The API Integration Checklist (Free PDF)