DeepSeek-V4 is one of the most ambitious open-weight model releases from DeepSeek so far. The family includes DeepSeek-V4-Pro, a 1.6T-parameter Mixture-of-Experts model with 49B activated parameters, and DeepSeek-V4-Flash, a smaller 284B-parameter MoE model with 13B activated parameters. Both models support a context length of up to one million tokens.
About 8 min