AI Pipelines - Livepeer Docs

The Livepeer AI Gateway exposes nine batch pipelines and one LLM pipeline through HTTP POST endpoints. Each pipeline accepts a JSON request body keyed by model_id and pipeline-specific fields, and returns a JSON response with the result. Real-time video AI (live-video-to-video) runs through the trickle protocol and is covered separately in the real-time AI overview. For warm models, VRAM requirements, and architecture support per pipeline, see model support. For SDK wrappers, see AI SDKs.

Shared conventions

Base URL: Any Livepeer Gateway endpoint. The community Gateway at https://dream-gateway.livepeer.cloud accepts unauthenticated requests for development. Authentication: Bearer token when the Gateway requires it. The community Gateway does not require a token. Request format: POST /<pipeline-endpoint> with Content-Type: application/json. model_id field: Every pipeline accepts a model_id field specifying the Hugging Face model ID (or Ollama model ID for LLM). Omitting model_id uses the pipeline’s default warm model. Error responses: 400 for malformed requests, 422 for validation errors (invalid model_id, missing required fields), 500 for inference failures. Error bodies include a detail field with the failure reason. Cold model latency: If no Orchestrator has the requested model warm in GPU memory, the first request triggers a model load (30 seconds to 5 minutes depending on model size). Subsequent requests to the same model on the same Orchestrator are immediate.

Pipeline reference

text-to-image

Generate images from text prompts using diffusion models (SDXL, SD 1.5, Flux).

curl -X POST https://dream-gateway.livepeer.cloud/text-to-image \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "SG161222/RealVisXL_V4.0_Lightning",
    "prompt": "a glowing neural network in a dark room",
    "width": 1024,
    "height": 1024,
    "guidance_scale": 7.5,
    "num_inference_steps": 8,
    "seed": 42
  }'

Field	Type	Required	Description
`model_id`	string	No	Hugging Face model ID. Default: `SG161222/RealVisXL_V4.0_Lightning`
`prompt`	string	Yes	Text prompt for generation
`negative_prompt`	string	No	Terms to avoid in generation
`width`	integer	No	Output width in pixels (default: 1024)
`height`	integer	No	Output height in pixels (default: 1024)
`guidance_scale`	number	No	Classifier-free guidance scale (default: 7.5)
`num_inference_steps`	integer	No	Denoising steps (default depends on model; Lightning models use 4-8)
`seed`	integer	No	Random seed for reproducibility
`num_images_per_prompt`	integer	No	Number of images to generate (default: 1)
`safety_check`	boolean	No	Run NSFW safety filter (default: true)

Response: JSON object with images array. Each image is a { url, seed } object.

image-to-image

Transform images using style transfer, enhancement, or img2img diffusion.

curl -X POST https://dream-gateway.livepeer.cloud/image-to-image \
  -F "model_id=timbrooks/instruct-pix2pix" \
  -F "prompt=make it look like a watercolour painting" \
  -F "image=@input.png" \
  -F "strength=0.8"

Field	Type	Required	Description
`model_id`	string	No	Default: `timbrooks/instruct-pix2pix`
`image`	file	Yes	Input image (multipart form upload)
`prompt`	string	Yes	Transformation instruction
`strength`	number	No	How much to transform (0.0 = no change, 1.0 = full regeneration)
`guidance_scale`	number	No	Guidance scale (default: 7.5)
`num_inference_steps`	integer	No	Denoising steps
`seed`	integer	No	Random seed
`safety_check`	boolean	No	NSFW filter (default: true)

Response: JSON with images array, same format as text-to-image.

image-to-image uses multipart/form-data, not application/json. The image is uploaded as a file field.

image-to-video

Animate a still image into a short video clip using Stable Video Diffusion.

curl -X POST https://dream-gateway.livepeer.cloud/image-to-video \
  -F "model_id=stabilityai/stable-video-diffusion-img2vid-xt" \
  -F "image=@input.png" \
  -F "fps=6" \
  -F "motion_bucket_id=127"

Field	Type	Required	Description
`model_id`	string	No	Default: `stabilityai/stable-video-diffusion-img2vid-xt`
`image`	file	Yes	Input image (multipart form upload)
`fps`	integer	No	Output frames per second (default: 6)
`motion_bucket_id`	integer	No	Motion intensity (0-255; default: 127)
`seed`	integer	No	Random seed
`safety_check`	boolean	No	NSFW filter (default: true)

Response: JSON with frames array containing frame URLs, or a video URL.

SVD outputs 14-25 frames at 576x1024 resolution. Text prompts are not used; the image is the sole conditioning input.

image-to-text

Generate captions or descriptions for images using BLIP or vision-language models.

curl -X POST https://dream-gateway.livepeer.cloud/image-to-text \
  -F "model_id=Salesforce/blip-image-captioning-large" \
  -F "image=@photo.jpg"

Field	Type	Required	Description
`model_id`	string	No	Default: `Salesforce/blip-image-captioning-large`
`image`	file	Yes	Input image (multipart form upload)
`prompt`	string	No	Optional prompt to guide caption content

Response: JSON with text field containing the generated caption.

audio-to-text

Transcribe audio to text with per-chunk timestamps using Whisper.

curl -X POST https://dream-gateway.livepeer.cloud/audio-to-text \
  -F "model_id=openai/whisper-large-v3" \
  -F "audio=@recording.mp3"

Field	Type	Required	Description
`model_id`	string	No	Default: `openai/whisper-large-v3`
`audio`	file	Yes	Audio file (mp4, webm, mp3, flac, wav, m4a). Max 50 MB.

Response: JSON with text (full transcript) and chunks array (per-segment timestamps and text).

text-to-speech

Generate natural speech from text using Parler-TTS.

curl -X POST https://dream-gateway.livepeer.cloud/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "parler-tts/parler-tts-large-v1",
    "text": "Livepeer is a decentralised video infrastructure network.",
    "description": "A female speaker with a warm, clear voice and moderate pace."
  }'

Field	Type	Required	Description
`model_id`	string	No	Default: `parler-tts/parler-tts-large-v1`
`text`	string	Yes	Text to synthesise. Max ~600 characters; chunk longer text.
`description`	string	No	Voice characteristics (speaker identity, style, audio quality)

Response: JSON with audio object containing a URL to the generated audio file.

Requires a pipeline-specific AI Runner container. Not all Orchestrators have this pipeline active.

upscale

Upscale low-resolution images using the SD x4-Upscaler (4x super-resolution).

curl -X POST https://dream-gateway.livepeer.cloud/upscale \
  -F "model_id=stabilityai/stable-diffusion-x4-upscaler" \
  -F "image=@lowres.png" \
  -F "prompt=high quality, sharp details"

Field	Type	Required	Description
`model_id`	string	No	Default: `stabilityai/stable-diffusion-x4-upscaler`
`image`	file	Yes	Input image (multipart form upload)
`prompt`	string	No	Optional quality guidance prompt
`seed`	integer	No	Random seed
`safety_check`	boolean	No	NSFW filter (default: true)

Response: JSON with images array, same format as text-to-image.

segment-anything-2

Promptable visual segmentation for images using SAM 2 (Meta AI).

curl -X POST https://dream-gateway.livepeer.cloud/segment-anything-2 \
  -F "model_id=facebook/sam2-hiera-large" \
  -F "image=@photo.jpg" \
  -F 'point_coords=[[500,375]]' \
  -F 'point_labels=[1]'

Field	Type	Required	Description
`model_id`	string	No	Default: `facebook/sam2-hiera-large`
`image`	file	Yes	Input image
`point_coords`	array	No	Point prompts as `[[x,y], ...]`
`point_labels`	array	No	Labels for points (1 = foreground, 0 = background)
`box`	array	No	Bounding box prompt `[x1, y1, x2, y2]`

Response: JSON with masks, scores, and logits arrays.

llm

OpenAI-compatible chat completions using Ollama-based runner.

curl -X POST https://dream-gateway.livepeer.cloud/llm \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "messages": [
      {"role": "user", "content": "Explain Livepeer in one sentence."}
    ]
  }'

Field	Type	Required	Description
`model`	string	Yes	Ollama-compatible model ID
`messages`	array	Yes	OpenAI-format message array (`role` + `content`)
`max_tokens`	integer	No	Maximum output tokens
`temperature`	number	No	Sampling temperature (0.0-2.0)
`stream`	boolean	No	Stream response tokens (SSE)

Response: OpenAI-compatible chat completion object with choices[0].message.content.

The LLM pipeline is in beta. The request format follows the OpenAI /v1/chat/completions shape. Supported models include Meta-Llama-3.1-8B-Instruct (warm, 8 GB VRAM), Mistral-7B-Instruct-v0.3, Gemma-2-9b-it, and Qwen2.5-7B-Instruct.

Operational notes

Multipart vs JSON. Pipelines that accept file uploads (image-to-image, image-to-video, image-to-text, audio-to-text, upscale, segment-anything-2) use multipart/form-data. Pipelines that accept only text input (text-to-image, text-to-speech, LLM) use application/json. Gateway selection. The community Gateway routes to whichever Orchestrator in the Active Set has the requested model warm. For production, operate a self-hosted Gateway with -maxPricePerUnit to control costs, or use a Gateway provider with an API key. safety_check filter. Enabled by default on image-generating pipelines. Set to false to disable. The filter runs on the Orchestrator side; disabling it does not affect content moderation policies that the Gateway operator may enforce. The AI quickstart walks through the first inference call end-to-end with error handling.

​Shared conventions

​Pipeline reference

​Operational notes

Shared conventions

Pipeline reference

Operational notes