Skip to main content
tongyi-mai avatar

tongyi-mai / z-image/turbo/controlnet

Generate images that follow a control image's edges, depth, or pose while matching your text prompt.

edit
0.0065 per megapixel of image

Model Input

Input

The prompt describing the image to generate.

URL of the control/reference image that guides structural conditioning. Its aspect ratio also drives the output size.

How to preprocess the control image before conditioning. 'none' uses the image directly; 'canny' extracts edges; 'depth' estimates a depth map; 'pose' extracts body pose.

Output size. Use a preset string (e.g. 'landscape_16_9') or a custom {width, height} object. 'auto' derives the size from the control image's aspect ratio.

Min: 0 - Max: 1

How strongly the control image conditions the result (0 = ignore control, 1 = strongest conditioning).

Additional Settings

Customize your input with more control.

Min: 0 - Max: 1

Fraction of the denoising process at which ControlNet conditioning begins.

Min: 0 - Max: 1

Fraction of the denoising process at which ControlNet conditioning ends.

Min: 1 - Max: 8

The number of inference steps to perform.

The same seed and the same prompt given to the same version of the model will output the same image every time.

Min: 1 - Max: 4

The number of images to generate. Each generated image is billed.

The format of the generated image.

The acceleration level to use. Higher acceleration is faster but may reduce quality.

If true, the prompt is automatically expanded/enriched before generation. Enabling this increases the price by a small per-request surcharge.

Safety checker can only be disabled on API call

You need to be logged in to run this model and view results.
Log in

Model Output

Output

Loading
Generated in 79.44 seconds
Logs (1 lines)

Model Example Requests

Examples

Example output 1Example output 2Example output 3Example output 4

Model Details

Model Details

Z-Image Turbo ControlNet generates images that follow the structure of a control image while matching a text prompt. Provide a reference image and choose how it conditions the result — use its edges (canny), depth, or human pose — so the output keeps the composition, geometry, or figure layout of the source while the prompt sets subject, style, and details. Built on the 6B Z-Image Turbo architecture, it runs in as few as 1–8 inference steps for near-real-time turnaround, and can return up to 4 images per request.

## Best for - Redrawing a scene while keeping its layout: feed a photo and prompt a new subject or style with the same composition - Edge-guided generation from a sketch or line drawing (canny) so the output traces your linework - Depth-guided generation that preserves 3D structure and camera perspective from a reference - Pose-controlled character or figure generation that copies a body pose from a reference photo - Fast, cheap structural-conditioning iterations where you want many variations at low cost

## Choose another model when - You want to transform an image by prompt strength alone with no structural map — use the z-image image-to-image variant - You want a pure text-to-image render with no reference image to anchor to — use a text-to-image model - You need to edit specific regions of an existing image with a mask — use an inpainting/edit model - You need video output — use an image-to-video model

## Tips - Set `preprocess` to match your control image: `canny` for line art / edges, `depth` for 3D structure, `pose` for figures, or `none` to condition on the raw image - Tune `control_scale` (0–1, default 0.75) to trade prompt freedom against how tightly the output follows the control image; lower it if the result feels over-constrained - Use `control_start` and `control_end` to apply conditioning only during part of the denoising process — ending early (e.g. 0.8) lets the model add prompt-driven detail late - Leave `image_size` at `auto` to inherit the control image's aspect ratio, or pass a preset / custom `{width, height}`

## Advanced Configuration - `enable_prompt_expansion` (default false) rewrites your prompt for richer detail; enabling it adds a small per-request surcharge. - `acceleration` (`none` / `regular` / `high`) trades a little quality for speed.

To run via the ModelRunner JavaScript client: ```js import { modelrunner } from "@modelrunner/client";

const result = await modelrunner.subscribe("tongyi-mai/z-image/turbo/controlnet", { input: { prompt: "A futuristic city skyline at night, neon lights", image_url: "https://media.modelrunner.ai/2ZBTR6fvTxz172zb027cJ.png", preprocess: "canny", control_scale: 0.75, image_size: "landscape_16_9", }, }); ```