Image engine

Can I run this image model?

Image generators work nothing like text models — no KV cache, nothing autoregressive. A denoiser runs for a number of steps over an image latent, alongside a VAE and one or more text encoders (FLUX and SD3.5 ship a ~4.7B T5 that's bigger than many LLMs). So memory is the sum of those weights plus activations, and speed is seconds per image = steps × per-step compute. Pick a model and your GPU.

Image model

Your GPU

Precision

Resolution

Sampler steps

Pick your GPU above to see if FLUX.1 [dev] fits and how fast it generates.

Running text or vision models? Back to the main calculator →