Pixtral 2409 12B is the latest flagship multimodal model from Mistral AI, combining a powerful 12B parameter decoder with a 400M parameter vision encoder. Trained natively on interleaved text and image data, it delivers leading performance across multimodal tasks such as VQA, document understanding, and chart reasoning. With support for variable image sizes and a sequence length of up to 128k tokens, Pixtral sets a new standard in its weight class for both vision-language and text-only benchmarks. Pixtral not only outperforms models of similar size on multimodal evaluations like MMMU, Mathvista, and ChartQA, but also maintains state-of-the-art performance on text-only tasks like MMLU, HumanEval, and mathematical reasoning. Its versatility and performance make it an ideal choice for advanced multimodal agents and instruction-following applications.
Provider
Context Size
Max Output
Latency
Speed
Cost
Data reflects historical performance over the past days.
API Usage
Seamlessly integrate our API into your project by following these simple steps: