Pixtral 2409 12B

Instruct

Image

Reasoning

Pixtral 2409 12B is the latest flagship multimodal model from Mistral AI, combining a powerful 12B parameter decoder with a 400M parameter vision encoder. Trained natively on interleaved text and image data, it delivers leading performance across multimodal tasks such as VQA, document understanding, and chart reasoning. With support for variable image sizes and a sequence length of up to 128k tokens, Pixtral sets a new standard in its weight class for both vision-language and text-only benchmarks. Pixtral not only outperforms models of similar size on multimodal evaluations like MMMU, Mathvista, and ChartQA, but also maintains state-of-the-art performance on text-only tasks like MMLU, HumanEval, and mathematical reasoning. Its versatility and performance make it an ideal choice for advanced multimodal agents and instruction-following applications.

Provider	Context Size	Speed	Input Cost	Output Cost

Usage

Generate your API key and query the model through the OpenAI-compatible interface. For more details, see the documentation.