Catalog

Models

Featured models across image, video, audio and text.

Chatterbox Voice Conversion

resemble-ai

Convert a spoken recording into a different target voice while keeping the original words, timing, and delivery — a speech-to-speech voice changer.

textaudio

Bria Eraser

bria

Remove unwanted objects, people, or watermarks from a photo by masking them out, leaving a clean, natural-looking result — with commercially safe outputs.

imageimageedit

DeepFilterNet 3

rikorose

Clean up a noisy speech recording by removing background noise and upsampling it to studio-quality 48 kHz audio.

audioaudiorefine

Ideogram Character

ideogram

Generate new images of the same character from one reference photo and a text prompt, keeping facial features and distinctive traits consistent across scenes.

imageimage

Whisper Large v3

openai

Transcribe or translate speech audio into text across 99 languages, with segment/word timestamps and optional speaker diarization.

speech-to-text

Kling 2.6 Pro Text-to-Video

kuaishou

Generate a short, cinematic video from a text prompt with smooth, fluid motion and strong prompt adherence.

textvideo

SAM Audio — Separate

meta

Isolate any sound from an audio mixture by describing it in plain language.

audioaudiosegmentation

GPT Image 2 Edit

openai

Edit an existing image from a text instruction, with precise instruction-following and accurate, legible in-image text, using up to 16 reference images and an optional mask.

imageimageedit

Imagen 4 Fast

google

Generate high-quality, photorealistic images from a text prompt fast and at the best price, with strong prompt adherence and improved in-image text rendering.

textimage

Imagen 4

google

Generate high-quality, photorealistic images from a text prompt, with strong prompt adherence, improved in-image text rendering, and up to 2K resolution.

textimage

RIFE Video Interpolation

megvii-research

Interpolate new in-between frames to boost a video's frame rate and produce smooth slow-motion.

videovideo

Bria Background Replace

bria

Keep the foreground subject of a photo and generate a brand-new background from a text prompt (or match a reference image), with commercially safe outputs.

imageimageedit

Florence-2 Large Caption

microsoft

Generate a concise one-sentence caption describing any photo — no prompt needed.

imagetextcaption

Clarity Upscaler

philz1337x

Upscale an image to higher resolution while creatively synthesizing fine detail, guided by an optional prompt and adjustable creativity-vs-fidelity controls.

upscalerupscale

Florence-2 Large Object Detection

microsoft

Detect and label every object in a photo and get back an annotated image with bounding boxes drawn on it.

imageimage

Lyria 2

google

Generate ~30 seconds of high-fidelity instrumental music from a text prompt, as a 48kHz WAV file.

textaudio

Segment Anything 2 (Auto-Segment)

meta

Automatically segment a photo into a combined object/region mask — no prompts, points, or clicks needed.

imageimagesegmentationmask

ElevenLabs Sound Effects V2

elevenlabs

Generate sound effects, Foley, and ambience from a text prompt, returning a hosted MP3.

textaudio

FLUX.1 Kontext [dev]

black-forest-labs

Edit an existing image from a text instruction — change objects, style, background, or text — while keeping the rest of the photo consistent.

imageimageedit

CodeFormer

sczhou

Restore and enhance blurry, low-resolution, compressed, or old face photos into a sharp, detailed image.

imageimagerefine

Bria Background Remove

bria

Remove the background from an image and return a transparent-PNG cutout of the subject, trained on fully licensed commercial data.

imageimageremove background

Bria Expand

bria

Expand (outpaint) an image onto a larger canvas, generating new surroundings that match the original — with commercially safe outputs.

imageimageextend

Recraft Vectorize

recraft

Convert a raster image (PNG/JPG/WEBP logo, icon, or sketch) into a clean, infinitely scalable SVG vector file.

imageimagevectorize

Depth Anything V2

depth-anything

Turn a single photo into a grayscale depth map for ControlNet conditioning, 3D, and relighting — no prompt, no tuning.

imageimagedepth estimation

Showing 1–24 of 88 models