InferenceBrain - AI Inference API Platform

⚡ Lightning Fast

Direct streaming for chat, queued processing for batch jobs. Optimized for both real-time and long-running tasks.

🔄 Swappable Backends

Switch between vLLM, OpenAI, Anthropic, and custom models without changing your code. Hot-reload configurations.

📊 Built-in Observability

Track usage, monitor performance, and optimize prompts with integrated Opik support and Prometheus metrics.

Quick Start Examples

Generate text with AI models (GPT, Llama, Claude, etc.)

curl https://inferencebrain.com/v1/generate/text \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Explain quantum computing in simple terms",
    "max_tokens": 200,
    "temperature": 0.7
  }'

Create stunning images from text descriptions

curl https://inferencebrain.com/v1/generate/image \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A serene mountain landscape at sunset",
    "width": 1024,
    "height": 1024
  }'

# Returns task_id for polling
# GET /v1/tasks/{task_id} to check status

Stream responses in real-time for chat applications

curl -N https://inferencebrain.com/v1/generate/text \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Write a short story",
    "max_tokens": 500,
    "stream": true
  }'

# Streams Server-Sent Events (SSE)
# data: {"text": "Once", "done": false}
# data: {"text": " upon", "done": false}
# ...

View Full Documentation →

AI Inference Made Simple

⚡ Lightning Fast

🔄 Swappable Backends

📊 Built-in Observability

Quick Start Examples