DeepSeek V4 Flash

InstructToolsReasoning

DeepSeek-V4-Flash is an efficiency-focused Mixture-of-Experts (MoE) language model from the DeepSeek-V4 series, with 284B total parameters and 13B activated per token, making it one of the lowest activation-footprint Tier-1 models. It supports a 1 million-token context window, enabling processing of entire codebases, long documents, and extended agent sessions while preserving context. The model uses a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), along with Manifold-Constrained Hyper-Connections to improve efficiency and stability. At 1M context, it achieves around 10% of DeepSeek-V3.2’s inference FLOPs and 7% of its KV cache usage, delivering significantly reduced compute cost and faster inference while maintaining near-equivalent reasoning performance to the larger V4-Pro model.

Provider	Context Size	Speed	Input Cost	Output Cost	Cache Cost

Usage

Generate your API key and query the model through the OpenAI-compatible interface. For more details, see the documentation.

Enter ↵