controlnet

Generate images that follow a control image's edges, depth, or pose while matching your text prompt.

edit

0.0065 per megapixel of image

OpenAPI

Input

Prompt

The prompt describing the image to generate.

Image URL

URL of the control/reference image that guides structural conditioning. Its aspect ratio also drives the output size.

Preprocess

How to preprocess the control image before conditioning. 'none' uses the image directly; 'canny' extracts edges; 'depth' estimates a depth map; 'pose' extracts body pose.

Image Size

Output size. Use a preset string (e.g. 'landscape_16_9') or a custom {width, height} object. 'auto' derives the size from the control image's aspect ratio.

Control Scale

Min: 0 - Max: 1

How strongly the control image conditions the result (0 = ignore control, 1 = strongest conditioning).

Additional Settings

Customize your input with more control.

Control Start

Min: 0 - Max: 1

Fraction of the denoising process at which ControlNet conditioning begins.

Control End

Min: 0 - Max: 1

Fraction of the denoising process at which ControlNet conditioning ends.

Number of Inference Steps

Min: 1 - Max: 8

The number of inference steps to perform.

Seed

The same seed and the same prompt given to the same version of the model will output the same image every time.

Number of Images

Min: 1 - Max: 4

The number of images to generate. Each generated image is billed.

Output Format

The format of the generated image.

Acceleration

The acceleration level to use. Higher acceleration is faster but may reduce quality.

Enable Prompt Expansion

If true, the prompt is automatically expanded/enriched before generation. Enabling this increases the price by a small per-request surcharge.

Enable Safety Checker

Safety checker can only be disabled on API call

You need to be logged in to run this model and view results.

Output

{
  "error": "",
  "inferenceTime": 5231,
  "output": [
    "https://media.modelrunner.ai/WTaGOfHRpAm8Uk5BYCRCx.png"
  ],
  "input": {
    "prompt": "A vivid watercolor painting of a white lighthouse with a red lantern room standing on a rocky cliff above crashing turquoise waves at golden-hour sunset, soft washes of orange and pink sky",
    "image_url": "https://media.modelrunner.ai/9syrS9NnVrVQDHybkC4AX.png",
    "image_size": "auto",
    "num_images": 1,
    "preprocess": "none",
    "control_end": 0.8,
    "acceleration": "regular",
    "control_scale": 0.75,
    "control_start": 0,
    "output_format": "png",
    "num_inference_steps": 8,
    "enable_safety_checker": true,
    "enable_prompt_expansion": false
  },
  "logs": "Generated 1 output(s)"
}

Generated in 5.231 seconds

Logs (1 lines)

Examples

Model Details

Z-Image Turbo ControlNet generates images that follow the structure of a control image while matching a text prompt. Provide a reference image and choose how it conditions the result — use its edges (canny), depth, or human pose — so the output keeps the composition, geometry, or figure layout of the source while the prompt sets subject, style, and details. Built on the 6B Z-Image Turbo architecture, it runs in as few as 1–8 inference steps for near-real-time turnaround, and can return up to 4 images per request.

## Best for - Redrawing a scene while keeping its layout: feed a photo and prompt a new subject or style with the same composition - Edge-guided generation from a sketch or line drawing (canny) so the output traces your linework - Depth-guided generation that preserves 3D structure and camera perspective from a reference - Pose-controlled character or figure generation that copies a body pose from a reference photo - Fast, cheap structural-conditioning iterations where you want many variations at low cost

## Choose another model when - You want to transform an image by prompt strength alone with no structural map — use the z-image image-to-image variant - You want a pure text-to-image render with no reference image to anchor to — use a text-to-image model - You need to edit specific regions of an existing image with a mask — use an inpainting/edit model - You need video output — use an image-to-video model

## Tips - Set `preprocess` to match your control image: `canny` for line art / edges, `depth` for 3D structure, `pose` for figures, or `none` to condition on the raw image - Tune `control_scale` (0–1, default 0.75) to trade prompt freedom against how tightly the output follows the control image; lower it if the result feels over-constrained - Use `control_start` and `control_end` to apply conditioning only during part of the denoising process — ending early (e.g. 0.8) lets the model add prompt-driven detail late - Leave `image_size` at `auto` to inherit the control image's aspect ratio, or pass a preset / custom `{width, height}`

## Advanced Configuration - `enable_prompt_expansion` (default false) rewrites your prompt for richer detail; enabling it adds a small per-request surcharge. - `acceleration` (`none` / `regular` / `high`) trades a little quality for speed.

To run via the ModelRunner JavaScript client: ```js import { modelrunner } from "@modelrunner/client";

const result = await modelrunner.subscribe("tongyi-mai/z-image/turbo/controlnet", { input: { prompt: "A futuristic city skyline at night, neon lights", image_url: "https://media.modelrunner.ai/2ZBTR6fvTxz172zb027cJ.png", preprocess: "canny", control_scale: 0.75, image_size: "landscape_16_9", }, }); ```

tongyi-mai / z-image/turbo/controlnet

Model Input

Input

Additional Settings

Model Output

Output

Model Example Requests

Examples

Model Details

Model Details