One API for text, image, video, and audio generation.
Swappable backends. Production-ready.
Direct streaming for chat, queued processing for batch jobs. Optimized for both real-time and long-running tasks.
Switch between vLLM, OpenAI, Anthropic, and custom models without changing your code. Hot-reload configurations.
Track usage, monitor performance, and optimize prompts with integrated Opik support and Prometheus metrics.
Generate text with AI models (GPT, Llama, Claude, etc.)
curl https://inferencebrain.com/v1/generate/text \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Explain quantum computing in simple terms",
"max_tokens": 200,
"temperature": 0.7
}'
Create stunning images from text descriptions
curl https://inferencebrain.com/v1/generate/image \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A serene mountain landscape at sunset",
"width": 1024,
"height": 1024
}'
# Returns task_id for polling
# GET /v1/tasks/{task_id} to check status
Stream responses in real-time for chat applications
curl -N https://inferencebrain.com/v1/generate/text \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Write a short story",
"max_tokens": 500,
"stream": true
}'
# Streams Server-Sent Events (SSE)
# data: {"text": "Once", "done": false}
# data: {"text": " upon", "done": false}
# ...