Trillion-dollar disruptor: China’s DeepSeek upends AI world overnight

DeepSeek’s open-source model shows that the US way is not the only AI way.  Depositphotos

This game-changing event is on the back of the company’s latest AI model – DeepSeek-R1 – being released for use on smartphones across the globe, following desktop launch on January 10.

DeepSeek has been on our radar for a few weeks, after its chatbot V3 dropped on December 26 and was reported to have performed as well as the leading US GPTs (generative pre-trained transformers) – something that few news outlets covered at the time (including us). With the AI frontrunners – all US companies – developing new features at breakneck speed, it was hard to imagine that this unheard-of large language model (LLM), even one that looked impressive on paper, and was fundamentally different in many ways, could rock the boat.

But that all changed overnight on January 27, 2025 – as China woke up on the day before Lunar New Year’s Eve, DeepSeek had become the #1 app in the AI/GPT world and decimated the stock price of the who’s who of the industry: As well as Nvidia and OpenAi, scalps included Meta, Google’s parent company Alphabet, Nvidia partners Oracle, plus many other energy and data center firms. Elon Musk dodged this bullet – only because X is no longer listed on the market.

While the market downturn is no doubt a temporary one, DeepSeek has permanently altered the path of the AI timeline. Until now, the US has been so far ahead in the field that all we really expected to see were poor imitations of the ‘gold standard’ models. And this is why DeepSeek is so interesting, because it’s forged its own path, setting up China as a new player in what some are now calling a digital arms race.

The company’s LLM was built using old Nvidia chips for a fraction of the cost invested by the likes of Anthropic and OpenAI on their respective models.  Depositphotos

What makes it so different are a number of things: It’s been trained on older, cheaper chips and cut out a few of the costly steps that has, until now, been the standard route for chatbots. Because of this, its development cost a reported US$5.6 million to rent the hardware required for training the model, compared with an estimated $60 million for Llama 3.1 405B, which also used 11 times the computing resources. GPT-4 cost more than $100 million. Microsoft has also said it plans to spend $80 billion on AI development in 2025. R1 is also open source, rather than closely guarded proprietary, which in turn helps DeepSeek navigate regional restrictions.

Overall, this has triggered a kind of existential crisis for the US-dominated industry – because what if a model could be produced for a fraction of the cost, and trained more efficiently, and be just as good, if not better?

“There are a few things to know about this one,” said Casey Newton, one of the hosts of the Hard Fork podcast on January 10. “One is that it’s really big; it has more than 680 billion parameters, which makes it significantly bigger than the largest model in Meta’s Llama series, which I would say up to this point has been the gold standard for open models. That one has 405 billion parameters.

“But the really, really important thing about DeepSeek is that it was trained at a cost of US$5.5 million,” he continued. “And so what that means is you now have an LLM that is about as good as the state-of-the-art [AIs] that was trained for a tiny fraction of what something like Llama or ChatGPT was trained for.”

To understand why DeepSeek is so significant, you have to look at where it came from. Its developer, quantitive – or quant – trader Liang Wenfung bought up thousands of Nvidia chips back in 2021 to work on a ‘side project’ to assist with his day job at the helm of one of the Chinese market’s largest hedge-fund companies, High-Flyer. The 40-year-old financier used these chips to build algorithms and mathematical models to help predict market trends and steer investments, with DeepSeek only established in 2023.

“When we first met him, he was this very nerdy guy with a terrible hairstyle talking about building a 10,000-chip cluster to train his own models,” one of Liang’s business partners told the Financial Times. “We didn’t take him seriously. He couldn’t articulate his vision other than saying: ‘I want to build this, and it will be a game changer.’ We thought this was only possible from giants like ByteDance and Alibaba.”

Less than two years on, the maker of those chips – Nvidia – would see $593 billion wiped from its market value overnight thanks to Wenfung. It’s now the biggest daily loss in US market history. (Incidentally, export of advanced Nvidia chips has now been restricted – yet DeepSeek-V3 was trained on cheaper, older Nvidia H800 hardware.)

What makes DeepSeek’s R1 model such a game-changer is its unorthodox training (and, in turn, the money saved in the process). This fantastic explainer covers a recent research paper released by the company, which essentially details how DeepSeek bypassed the traditional supervised fine-tuning stage of LLM development and instead focused on the AI’s “self-evolution through a pure reinforcement learning process.”

“We demonstrate that reasoning capabilities can be significantly improved through large-scale reinforcement learning (RL), even without using supervised fine-tuning (SFT) as a cold start,” the DeepSeek researchers wrote in the January paper. “Furthermore, performance can be further enhanced with the inclusion of a small amount of cold-start data.”

While this is unlikely to rock the world of LLM users, who are most likely casually interacting with the likes of Google’s Gemini or Anthropic’s Claude, it stands as a defining moment in the development of this technology. Which brings us to another aspect of its business model that sets it apart – and has the industry rattled: Access.

As Nature‘s Elizabeth Gibney wrote about on January 23. DeepSeek-R1 is released as “open weight,” which means it can be used as a tool for researchers to study and build on. In comparison, existing market-leading models are what researchers deem a “black box,” a closed-off system controlled instead by the developers. It paves the way for scientists to harness an existing model for their own uses, rather than build from the ground up.

“DeepSeek hasn’t released the full cost of training R1, but it is charging people using its interface around one-30th of what [Open AI’s] o1 costs to run,” Gibney noted. “The firm has also created mini ‘distilled’ versions of R1 to allow researchers with limited computing power to play with the model.”

However, as DeepSeek triggered the market crash on January 27, it was met with cyberattackers attempting to crash its servers.

“Due to large-scale malicious attacks on DeepSeek’s services, we are temporarily limiting registrations to ensure continued service,” the company posted on its status page. “Existing users can log in as usual. Thanks for your understanding and support.”

As of writing, DeepSeek-R1 can still be downloaded and the site accessed, but new registrations are restricted to China residents with a local phone number.

Meanwhile, a somewhat inevitable backlash is now under way, with countless news outlets including Forbes noting that DeepSeek-R1 is hampered by censorship, stonewalling questions that would evoke criticism of China. Silicon Valley startup Perplexity AI – which currently has its sights on a US merger deal with TikTok’s parent company ByteDance – was briefly hosting an “uncensored” search engine powered by DeepSeek-R1, but this too has been taken offline.

Regardless of how this plays out in the coming days and weeks, one thing is certain: DeepSeek, in a few short weeks, has singlehandedly shifted the course of AI development.

“The emergence of DeepSeek is a significant moment in the AI revolution,” said Professor Geoff Webb, from the Department of Data Science & AI at Monash University in Australia. “Until now it has seemed that billion-dollar investments and access to the latest generation of specialized Nvidia processors were prerequisites for developing state-of-the-art systems. This effectively limited control to a small number of leading US-based tech corporations.

He adds that if DeepSeek’s claims are all true, “it means that the US tech sector no longer has exclusive control of the AI technologies, opening them to wider competition and reducing the prices they can charge for access to and use of their systems.”

Webb then makes an important point that few people are talking about: The monopolization of AI by a handful of powerful players in the US – further consolidated by government-legislated export restrictions on crucial Nvidia hardware – essentially denies the rest of the world a stake in the most significant technological advancement since the internet.

“Looking beyond the implications for the stock market, current AI technologies are US-centric and embody US values and culture,” he added. “This new development has the potential to create more diversity through the development of new AI systems. “It also has the potential to make AI more accessible for researchers around the world both for developing new technologies and for applying them in diverse areas including healthcare.”

Tags

Leave a Reply