Nvidia Built a Monopoly on the AI Boom. Now Everyone Is Trying to Break It.

How one company came to control the hardware powering the most important technology of our time — and why AMD, Google, Amazon, and a dozen startups are racing to change that

Jensen Huang had a habit, during the most intense period of AI investment frenzy in 2023 and 2024, of showing up to events in his signature black leather jacket and presenting slides that made competitors wince. Quarterly earnings calls that other chip companies approached with careful hedging became, for Nvidia, occasions to raise guidance by margins that analysts struggled to process. Data center revenue that had been measured in billions became tens of billions became approaching a hundred billion annually. The stock, already extraordinary, kept going.

The source of this exceptional position is straightforward to describe and extraordinarily difficult to replicate. Nvidia Graphics Processing Units — GPUs, chips originally designed to render video game graphics — turned out to be almost perfectly suited for the matrix multiplication operations that underlie modern AI training. When deep learning began demonstrating remarkable capabilities in the early 2010s, Nvidia hardware was there. The company invested aggressively in software — particularly its CUDA programming framework, which gave developers tools to write code that ran on Nvidia chips with relative ease. By the time the AI boom arrived in full force, Nvidia had a hardware advantage, a software ecosystem advantage, and a talent pipeline advantage that competitors were years behind on. The result was something close to a monopoly on the hardware powering the most consequential technology of the era.

How Dominant Is It, Really

The numbers are stark. Nvidia held approximately 70 to 80 percent of the AI training chip market through 2024 and into 2025. Its H100 GPU — the chip that became synonymous with AI infrastructure investment — was so constrained in supply during peak demand that companies were paying significant premiums on secondary markets and reporting multi-quarter delivery wait times from Nvidia directly. When OpenAI, Google, Anthropic, and Microsoft talked about compute as the limiting constraint on AI development, what they largely meant was Nvidia H100 availability.

The margins reflect the position. Nvidia data center gross margins have run above 70 percent — extraordinary for a hardware company, more typical of software businesses with near-zero marginal costs. The company is, in effect, extracting software-like economics from a hardware product because its competitive position is strong enough to support pricing that hardware commodity economics would never allow.

The Challengers

The scale of Nvidia margins and market position has attracted every major technology company into the chip business in ways that would have seemed implausible a decade ago.

AMD is the most conventional challenger. Its MI300X GPU is a genuine competitor to Nvidia H100 on raw performance metrics, and AMD has been making targeted progress on the software ecosystem problem — investing heavily in ROCm, its alternative to CUDA, and working with major AI labs to ensure their training frameworks run well on AMD hardware. Several hyperscalers have begun deploying MI300X at meaningful scale, and AMD reported strong data center GPU revenue growth through 2024. It is not close to matching Nvidia market share, but it is establishing a credible alternative.

Google has been designing its own AI chips — Tensor Processing Units, or TPUs — since 2015. The latest generation, TPU v5, powers a significant portion of Google internal AI workloads and is available to external customers through Google Cloud. Google approach is not to sell chips but to sell cloud compute that happens to run on its own superior hardware — a model that captures the economics without the complexity of a chip business.

Amazon similarly designs custom AI chips — Trainium for training and Inferentia for inference — that power AWS cloud offerings. The economics are compelling: if Amazon can run AI workloads on chips it manufactures rather than ones it buys from Nvidia, it captures the margin that would otherwise flow to Nvidia. The chips are not yet competitive with Nvidia at the frontier of training capability, but for inference at scale — running queries against deployed models rather than training new ones — they are increasingly viable alternatives.

The most aggressive challenge comes from a cohort of startups that have raised billions of dollars on the premise that purpose-built AI chips can outperform general-purpose GPUs for specific workloads. Cerebras Systems builds wafer-scale chips — single chips the size of an entire silicon wafer rather than the small dies traditional chips use — that offer extraordinary memory bandwidth for certain AI architectures. Groq has built chips optimized specifically for inference speed, claiming performance that significantly exceeds GPU-based alternatives for deployed model serving. SambaNova, Graphcore, Tenstorrent, and others are pursuing variations on the theme of domain-specific AI hardware.

Why CUDA Is the Real Moat

The hardware competition, as fierce as it is, misses what many insiders identify as Nvidia most durable advantage: CUDA. CUDA is the programming framework that developers use to write code that runs on Nvidia GPUs. It was released in 2006 — long before AI training was a commercially relevant application — and over nearly two decades it has accumulated an extraordinary amount of developer tooling, libraries, documentation, trained developers, and institutional knowledge.

Every major AI training framework — PyTorch, TensorFlow, JAX — is built with primary support for CUDA. The models researchers publish, the reference implementations that practitioners rely on, the tutorials that train the next generation of AI engineers — virtually all of it assumes CUDA. Switching to a different chip vendor means not just swapping hardware but porting code, retraining engineering teams, and debugging the subtle differences in how operations are implemented. For most organizations, the total cost of switching exceeds the cost savings from potentially cheaper hardware.

This software ecosystem lock-in is the reason Nvidia can maintain GPU margins that hardware economics should not support. The challengers understand this — it is why AMD investment in ROCm is not optional but existential, and why Google and Amazon build their alternative ecosystems around cloud APIs rather than asking customers to rewrite their code.

The Export Control Complication

Nvidia position is complicated by geopolitics in ways that create both constraints and opportunities. US export controls restrict the sale of the most capable Nvidia AI chips to China — a market that had previously been a significant revenue source. Nvidia responded by creating chips specifically designed to comply with export control thresholds — the A800 and H800 for the Chinese market — but successive rounds of export control tightening have constrained even these alternatives.

The export controls have the paradoxical effect of both protecting and threatening Nvidia. They protect Nvidia in that they similarly constrain competitors — AMD cannot sell its MI300X to Chinese customers either. They threaten Nvidia in that they are driving Chinese technology investment toward domestic alternatives with urgency that might not otherwise exist. Huawei Ascend chips, developed specifically to provide an alternative to US AI hardware, have been deployed at meaningful scale within China and represent a long-term strategic challenge to US chip dominance that export controls are simultaneously producing and responding to.

What the Next Hardware Generation Looks Like

Nvidia is not standing still. The Blackwell GPU architecture, succeeding Hopper, was announced in early 2024 and began shipping at scale through late 2024 and 2025. Blackwell delivers performance improvements that maintain Nvidia lead over current competitors — a lead that represents years of architectural development that challengers cannot close overnight.

The more interesting question is whether the architectural requirements of AI workloads are converging in ways that might reduce the GPU advantage over time. Inference — the dominant workload in terms of volume, if not in terms of compute intensity — has different optimization targets than training. It favors low latency, high throughput, and energy efficiency over the raw floating-point throughput that GPUs excel at. The inference-optimized chips from Groq and others are making a credible case that purpose-built inference hardware can outperform GPUs for specific deployed model serving applications.

If training and inference bifurcate into distinct hardware markets — each with different leaders — the current Nvidia dominance in the combined market becomes a more complicated picture. That bifurcation is not certain, and Nvidia is investing to prevent it. But it is the scenario that Nvidia competitors are most actively designing toward, and it may represent the most credible path to meaningful disruption of the current market structure.

For now, Nvidia remains the company that built a monopoly by being in the right place at the right time with hardware that turned out to matter enormously — and then investing aggressively enough in software and ecosystem to make that position durable. Whether that durability persists through the next generation of AI hardware is the most consequential business question in technology right now.