Image engine
Can I run this image model?
Image generators work nothing like text models — no KV cache, nothing autoregressive. A denoiser runs for a number of steps over an image latent, alongside a VAE and one or more text encoders (FLUX and SD3.5 ship a ~4.7B T5 that's bigger than many LLMs). So memory is the sum of those weights plus activations, and speed is seconds per image = steps × per-step compute. Pick a model and your GPU.
Pick your GPU above to see if FLUX.1 [dev] fits and how fast it generates.
Running text or vision models? Back to the main calculator →