2025-01-20

DeepSeek-R1: AI New Star

On January 20, 2025, DeepSeek open-sourced its first-generation reasoning models ——DeepSeek-R1 and DeepSeek-R1-Zero. Following the release of its MoE (Mixture-of-Experts) model DeepSeek-V3 last December, and shook up the AI world once again.

DeepSeek-R1 doesn't just cut training costs, and it also makes a big leap forward, shaking up the old idea that open-source models can't keep up with their closed-source rivals. It even matches top-tier models like OpenAI-o1 in multiple benchmarks.

In addition to open-sourcing the models, DeepSeek has released the training methods and published detailed research papers, which making it easier to find out the model's innovations and it's broader impact.

What' new?

Open-sourced and flexible deployment

DeepSeek has fully open-sourced the model, which can be directly downloaded and run locally. Additionally, users are permitted to use R1 to train other models. There are even mini models available that can run on mobile devices.
Support all kinds of flexible privacy deployment.

Excellent cost-effectiveness

Low computational power consumption
- Training a 1.5B model requires only 7GB of GPU memory.
Low training cost
- Training costs just $5.576 million, which is far below to other tech giants that often invest hundreds of millions or even billions of dollars.
Low hardware requirements
- DeepSeek R1 has low hardware requirements and can run efficiently on machines with lower performance.
Low usage cost
- The API service pricing is $0.14 per million input tokens (cache hit) / $0.55 per million input tokens (cache miss), and $2.19 per million output tokens. Compared with OpenAI-o1's pricing of $15 per million input tokens and $60 per million output tokens, it offers much higher cost-effectiveness.

Technical Innovation

Novel Reinforcement Learning Algorithm:
- DeepSeek uses GRPO (Group Relative Policy Optimization), a new reinforcement learning method that no longer relies on a separate value network to estimate rewards. Instead, it estimates rewards based on group comparisons, reducing memory and computing costs while improving training stability and mathematical reasoning capabilities .
Innovative Training Method:
- Unlike traditional training methods that rely on adding numerous Chain-of-Thought (COT) examples in Supervised Fine-Tuning (SFT) , using complex neural network reward models to teach the model thinking, DeepSeek-R1-Zero uses a pure reinforcement learning (RL) path. Making the model to learn and reason autonomously, leading to more explicit emergent properties.