DeepSeek's surprisingly inexpensive AI model challenges industry giants. The Chinese startup claims to have trained its powerful DeepSeek V3 neural network for a mere $6 million, utilizing only 2048 GPUs, a stark contrast to competitors' significantly higher costs. This seemingly low figure, however, omits substantial expenses like research, refinement, data processing, and infrastructure.
DeepSeek's innovative approach leverages several key technologies: Multi-token Prediction (MTP) for improved accuracy and efficiency; Mixture of Experts (MoE) employing 256 neural networks for accelerated training; and Multi-head Latent Attention (MLA) for enhanced focus on crucial sentence elements.
Image: ensigame.com
Contrary to their publicized figures, SemiAnalysis reveals DeepSeek operates a massive computational infrastructure, encompassing approximately 50,000 Nvidia Hopper GPUs across multiple data centers, representing a total server investment of roughly $1.6 billion and operational costs near $944 million. This includes 10,000 H800, 10,000 H100, and additional H20 GPUs.
Image: ensigame.com
DeepSeek, a subsidiary of High-Flyer, a Chinese hedge fund, owns its data centers, unlike cloud-reliant competitors, fostering faster innovation and optimization. Its self-funded status contributes to agility and swift decision-making. Furthermore, DeepSeek attracts top talent, with some researchers earning over $1.3 million annually, primarily from Chinese universities.
Image: ensigame.com
While DeepSeek's $6 million training cost is misleading, their overall investment exceeds $500 million. Their lean structure facilitates efficient innovation, contrasting with larger, more bureaucratic companies. The substantial investment, technological advancements, and skilled team are key to their success, not solely a "revolutionary budget." The cost disparity is evident: DeepSeek's R1 model cost $5 million, while ChatGPT4o's training cost $100 million.
Image: ensigame.com
DeepSeek's story highlights the potential of well-funded, independent AI companies to compete effectively, though the narrative of exceptionally low costs requires careful scrutiny.