menu
close

NVIDIA Unveils Blackwell Ultra to Dominate AI Inferencing Market

NVIDIA is set to release its next-generation Blackwell Ultra AI chips in the second half of 2025, specifically designed to excel at AI inferencing where competitors like Amazon, AMD, and Broadcom are gaining ground. The new chips promise up to 1.5 times more AI compute performance than current Blackwell GPUs and feature significantly expanded memory capacity. This strategic move could help NVIDIA maintain its dominance in the rapidly growing AI inference market, which is expected to eventually dwarf the training market in size.
NVIDIA Unveils Blackwell Ultra to Dominate AI Inferencing Market

NVIDIA is preparing to strengthen its position in the AI chip market with the upcoming release of its Blackwell Ultra architecture, a significant upgrade to the Blackwell platform announced earlier this year.

While NVIDIA has dominated the AI training market, the company faces increasing competition in the inference space, where AI models are deployed to generate responses rather than being trained. As AI applications become more complex and widespread, industry experts predict the inference market will grow dramatically over the next few years, attracting competitors eager to challenge NVIDIA's dominance. Unlike AI training, which requires enormous computing power across entire data centers, inference workloads are more diverse and can be handled by various specialized hardware.

The Blackwell Ultra-based products are expected to be available from partners starting in the second half of 2025. Major cloud providers including Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure will be among the first to offer Blackwell Ultra-powered instances, with server manufacturers like Dell, HPE, Lenovo, and Supermicro following with their own implementations.

The new architecture leverages NVIDIA's second-generation Transformer Engine with custom Blackwell Tensor Core technology, combined with TensorRT-LLM and NeMo Framework innovations to accelerate both inference and training for large language models. Blackwell Ultra Tensor Cores deliver 2X the attention-layer acceleration and 1.5X more AI compute FLOPS compared to standard Blackwell GPUs.

According to NVIDIA, the Blackwell Ultra family boasts up to 15 petaFLOPS of dense 4-bit floating-point performance and up to 288 GB of HBM3e memory per chip. This is particularly significant for AI inference, which is primarily a memory-bound workload—the more memory available, the larger the model that can be served. Ian Buck, NVIDIA's VP of hyperscale and HPC, claims the Blackwell Ultra will enable reasoning models to be served at 10x the throughput of the previous Hopper generation, reducing response times from over a minute to as little as ten seconds.

NVIDIA faces growing competition from AMD, which recently launched its MI300 series for AI workloads and has gained adoption from companies looking for alternatives amid NVIDIA's supply constraints. In 2025, AMD announced the acquisition of AI hardware and software engineers from Untether AI to strengthen its inference capabilities. Amazon is also demonstrating ambitions to control the entire AI infrastructure stack with its Graviton4 and Project Rainier's Trainium chips, successfully training major AI models like Claude 4 on non-NVIDIA hardware.

Despite these challenges, analysts project NVIDIA's data center sales to grow to $200 billion in 2025, maintaining approximately 80-85% market share in the near term. The company's strategy with Blackwell Ultra appears focused on securing its position in the inference market while continuing to innovate in training capabilities, potentially changing the assumption that top AI models must rely exclusively on NVIDIA hardware.

Source: Technologyreview

Latest News