On January 20, 2025, DeepSeek released R1, a model that took the AI world by surprise. With 671 billion parameters—1.5 times larger than Llama 3—it stands toe-to-toe with OpenAI’s o1, particularly in areas like math and coding. But that’s not the real reason everyone’s talking about it.
Why is DeepSeek disrupting the AI industry?
According to DeepSeek, R1 was developed with $5.6 million, 2,000 GPUs, and 55 days, a fraction of the estimated $100 million spent on OpenAI’s GPT-4. In comparison to the $500 billion invested in the Stargate Project, this cost appears remarkably low.
Another key factor is hardware accessibility. Since October 2022, the U.S. has restricted the export of Nvidia’s H100 GPUs, widely used by companies like OpenAI, Anthropic, and Meta for AI training. This led to the assumption that high-performance AI models required access to these chips.
DeepSeek’s R1 challenges that notion, demonstrating that competitive AI models can be developed using the less powerful H800 chips while keeping costs relatively low. Additionally, R1 is free to use and open-source, making it accessible for developers to modify and build upon.
By focusing on efficiency and accessibility, DeepSeek’s R1 introduces an alternative approach to AI development—one that could influence how future models are built.
The impact was immediate. Its release triggered a selloff in the stock market, with U.S. tech stocks—particularly Nvidia, a key supplier of high-performance AI chips—plunging 17% as investors reacted to the news.
How did DeepSeek achieve high performance with lower computing power?
DeepSeek R1 stands out not only for its capabilities but also for how efficiently it was built. Instead of relying on massive computational resources, it leverages several optimization techniques to reduce computing costs while maintaining strong performance.
1. FP8 Precision Instead of FP32
Most deep learning models use 32-bit floating point (FP32) for precision. While highly detailed, FP32 requires substantial memory and computing power.
DeepSeek R1, however, uses 8-bit floating point (FP8), a much smaller precision format. This allows for faster training and inference speeds, reducing memory usage by 75% while maintaining accuracy.
2. Multi-Token Prediction Instead of Single-Token Processing
Traditional AI models generate responses one token at a time, with each prediction depending on the previous one. This means a separate computation step is required for every token, making the process slower.
DeepSeek R1, in contrast, predicts multiple tokens in parallel, reducing dependency on previous tokens. This significantly improves processing speed without major accuracy trade-offs.
3. Mixture of Experts (MoE) framework
Despite having 671 billion parameters, DeepSeek R1 does not activate all of them at once. Instead, it employs a Mixture of Experts (MoE) framework, where only a fraction of the model is used for each request.
A gate system dynamically selects the most relevant experts for each input, leading to:
- Lower computational costs since only necessary parts of the model are active.
- More efficient scaling, as increasing parameters doesn’t proportionally increase compute requirements.
- The ability to run on limited hardware while still delivering strong performance.
4. Group Relative Policy Optimization vs Proximal Policy Optimization (GRPO vs PPO)
Unlike PPO, which relies on a value function to estimate expected rewards, GRPO eliminates the need for this component. Instead, it evaluates multiple responses to a given prompt simultaneously, assessing their relative quality within the group.
This group-based evaluation allows the model to learn more effectively from a set of possible outputs, leading to more efficient training and reduced computational overhead.
5. Possible Use of Knowledge Distillation
DeepSeek R1 may have also leveraged knowledge distillation, a technique where a smaller, more efficient model learns from a larger, more powerful model.This allows R1 to:
- Retain much of the intelligence of a larger model while using fewer resources.
- Benefit from prior investments in large-scale AI training without the same computational burden.
- Improve efficiency without significantly compromising performance.
The controversies surrounding DeepSeek
While DeepSeek R1 has gained attention for its efficiency and open-source approach, it has also faced scrutiny over data privacy, security practices, and potential misuse of proprietary AI models.
1. Data privacy and security concerns
Security researchers from cloud security firm Wiz discovered that DeepSeek’s database was exposed, containing sensitive user data, including chat histories and API authentication keys. While DeepSeek quickly secured the database after being alerted, the incident raised concerns about its data handling practices and overall security measures.Additionally, regulatory bodies have taken action:
- Italy’s data protection authority (Garante) banned DeepSeek’s app due to insufficient transparency on personal data usage.
- The U.S. Navy also prohibited its use, citing security and ethical concerns.
Meanwhile, OpenAI launched ChatGPT Gov, a specialized version designed for U.S. government agencies, capitalizing on the rising privacy concerns surrounding DeepSeek.
2. Allegations of Unauthorized Use of OpenAI’s Models
OpenAI claims to have evidence that DeepSeek extracted data from its proprietary AI systems to train its own models. This was allegedly done using distillation, a technique where a smaller model learns from a more powerful one, improving efficiency while significantly reducing training costs.
Microsoft’s security researchers flagged the issue after observing individuals linked to DeepSeek extracting large amounts of data from OpenAI’s systems.
While OpenAI allows developers to integrate its AI via API, using its outputs to train competing models violates its terms of service. Both OpenAI and Microsoft are currently investigating the situation.
This has sparked broader discussions about AI security and intellectual property protection, with White House AI advisor David Sacks commenting:
“I think one of the things you’re going to see over the next few months is our leading AI companies taking steps to try and prevent distillation.”
3. Doubts Over DeepSeek’s Reported Development Costs and Chips Usage
DeepSeek claims that R1 was developed with just $5.6 million, but industry reports indicate that this figure only accounts for the direct training costs, such as computing power and electricity, while excluding other significant expenses such as research and development, data gathering and cleaning, staff salaries, and infrastructure.
Additionally, US officials are investigating whether DeepSeek may have acquired advanced Nvidia chips through third parties in Singapore or intermediaries in Southeast Asia, potentially circumventing export restrictions.
These findings raise concerns about the true scale of DeepSeek’s investments and whether its AI development was as cost-efficient as claimed.
How to try DeepSeek?
DeepSeek R1 is accessible across multiple platforms, including web, iOS, and Android. Simply download the app or visit the website, sign up, and start using it for free.
If you have concerns about data privacy and security when using DeepSeek R1, a secure alternative is Hypotenuse AI—an all-in-one AI platform designed for ecommerce brands to manage product data and create product content. While it’s not specialized in coding and math, it offers features such as bulk product description generation, marketing content creation, and product data enrichment.
For addressing general inquiries, Hypotenuse AI’s chatbot, HypoChat, provides answers with sources from the web. You can sign up for a free trial here.