Model Details
GOT-OCR 2.0 reads text out of images. Pass one or more image URLs and it returns the recognized text — one string per image — covering printed documents, handwritten and scene text, as well as structured content like tables, math formulas, charts, and sheet music. Its strength is general optical character recognition across many content types in a single model: where plain OCR stops at flat text, GOT-OCR 2.0 also recovers layout-bearing and notation-heavy material. Enable formatted mode to get the result as Markdown/LaTeX so tables and equations come back with their structure intact rather than as a flat run of characters.
## Best for - Pulling the text out of scanned or photographed documents and receipts - Reading scene text and signage from natural photos - Extracting tables, math formulas, and charts as structured Markdown/LaTeX (set `do_format` to true) - Transcribing sheet music and other notation-heavy images - Batch OCR over a set of images in one call — each image yields its own text string
## Choose another model when - You want to translate or summarize the recognized text rather than just extract it — use a text model after OCR - You need word-level bounding boxes or precise layout coordinates — this returns text, not positional geometry - Your input is audio or video rather than an image — use a speech-to-text or video model
## Tips - Pass each page or photo as a separate entry in `input_image_urls`; the output array returns one recognized-text string per image, in order. - Set `do_format` to true when the image contains tables, equations, or sheet music and you want Markdown/LaTeX structure preserved; leave it false for plain text where a flat transcript is fine. - Use clear, well-lit, reasonably high-resolution images — blur, glare, and skew reduce recognition accuracy.
## Advanced Configuration - `do_format` (boolean, default `false`): when `true`, the recognized text is returned in formatted mode (Markdown/LaTeX), preserving the structure of tables and mathematical/musical notation. Leave it `false` for a plain-text transcript.
To run via the ModelRunner JavaScript client: ```js import { modelrunner } from "@modelrunner/client";
const result = await modelrunner.subscribe("stepfun-ai/got-ocr/v2", { input: { input_image_urls: [ "https://media.modelrunner.ai/example-document.png" ], do_format: true, }, }); ```
