Skip to main content
ModelRunner
moondream avatar

moondream / moondream3-preview/query

Ask a natural-language question about an image and get a text answer, with frontier-level visual reasoning, OCR, and object understanding.

Priced per token

Model Input

Input

URL of the image to ask about.

Question to ask about the image.

Include the model's detailed reasoning in the response.

Min: 0 - Max: 1

Sampling temperature. 0 is deterministic; higher values (up to 1) increase variety.

Min: 0 - Max: 1

Nucleus sampling probability mass.

You need to be logged in to run this model and view results.
Log in

Model Output

Output

Generated in 1.419 seconds
Logs (1 lines)

Model Example Requests

Examples

Model Details

Model Details

Moondream 3 Preview answers natural-language questions about an image. Pass an image URL and a question ("What is the person doing?", "How many cars are in the lot?", "What does the sign say?") and it returns a concise text answer. It is a compact, efficient vision-language model built for frontier-level visual reasoning — reading text in a scene (OCR), counting and identifying objects, describing what is happening, and grounding answers in fine image detail — while staying fast and inexpensive to run at scale.

## Best for - Visual question answering: asking free-form questions about a photo, screenshot, chart, or document image - Reading text inside images (signs, labels, receipts, handwriting) and answering questions about it - Counting, identifying, and locating objects or people in a scene - Describing image content for accessibility, moderation triage, or content tagging - High-volume image understanding where cost-per-call and latency matter

## Choose another model when - You want to generate or edit an image rather than describe one — use a text-to-image or image-editing model - You have no image and only need a text answer — use a text-only language model - You need pixel-precise bounding boxes or segmentation masks as structured output rather than a written answer — use a detection or segmentation model

## Tips - Ask one clear, specific question per call; specifying the desired answer format ("answer in one short sentence", "reply with just the number") tightens the output. - Leave `temperature` at its default of 0 for deterministic, factual answers; raise it (up to 1) only when you want more varied phrasing. - Keep `reasoning` enabled (default) to also receive the model's step-by-step reasoning trace alongside the answer; set it to `false` for just the final answer and lower latency.

## Advanced Configuration - `reasoning` (boolean, default `true`): when `true`, the response includes the model's detailed reasoning behind the answer; when `false`, the reasoning trace is omitted and only the answer is returned. - `temperature` (number 0–1, default `0`): sampling temperature for the answer. `0` is deterministic; higher values increase variety. - `top_p` (number 0–1): nucleus-sampling probability mass, an alternative way to control answer diversity.

To run via the ModelRunner JavaScript client: ```js import { modelrunner } from "@modelrunner/client";

const result = await modelrunner.subscribe("moondream/moondream3-preview/query", { input: { image_url: "https://storage.googleapis.com/falserverless/example_inputs/moondream-3-preview/query_in.jpg", prompt: "What is in this image? Answer in one short sentence.", reasoning: false, }, }); ```