The Infrastructure Bubble: TurboQuant, $690B Capex, and the AI Hardware Reckoning

Why the application bubble may be wrong but the infrastructure bubble is real — and accelerating
PRZC Research | March 28, 2026 | AI Infrastructure Analysis

Executive Summary

On March 24–25, 2026, Google Research published TurboQuant — a compression algorithm that reduces the memory required to run large language models during inference by at least 6x, with zero accuracy loss, requiring no retraining. Memory stocks responded immediately: SK Hynix fell 6%, Samsung fell 5%, Micron fell 5%, Kioxia fell 6%. Morgan Stanley told investors to buy the dip. TrendForce called the reaction "likely an overreaction." Both may be right about the immediate quarter — and both are wrong about the structural trajectory.

TurboQuant is not an isolated event. It is the latest data point in a compounding series of efficiency breakthroughs — DeepSeek V3, Mamba/SSM hybrid architectures, Mixture-of-Experts, the Groq LPU acquisition by Nvidia, SanDisk's High Bandwidth Flash — that are collectively exerting deflationary pressure on the hardware requirements per unit of AI compute. Each breakthrough alone is manageable. The trend they collectively represent is not.

This report makes a specific, structural argument: the AI application bubble and the AI infrastructure bubble are distinct phenomena and must be evaluated separately. Our companion report (T38) argues that Anthropic's revenue trajectory, enterprise adoption data, and market displacement effects are inconsistent with a bubble at the application layer. That argument holds. But the infrastructure layer operates under entirely different economics — and those economics are increasingly fragile.

The four core findings:

  1. TurboQuant is a genuine breakthrough — but its direct market impact is narrow. It compresses inference-time KV cache in DRAM, not the HBM inside training GPUs. The immediate memory stock selloff overshot. But the underlying signal — sustained algorithmic efficiency compressing hardware requirements — is the correct read.
  2. The infrastructure investment scale is historically unprecedented and structurally mismatched. $690B in hyperscaler capex in 2026, financed partly by $182B in new debt, is being deployed into hardware with a 1-year effective economic life against debt structures assuming 7–15 year asset lifetimes.
  3. Multiple converging architectural shifts — Groq LPU, Cerebras wafer-scale, High Bandwidth Flash, PNM/PIM memory, SSM/Mamba hybrids — are pointing away from the GPU+HBM stack that the entire current buildout is optimized for. None dominates today. The trajectory is clear.
  4. The Jevons Paradox is real but bounded. Cheaper compute does generate more demand — the evidence from DeepSeek is empirically clear. But Jevons requires that marginal demand grows faster than marginal efficiency gains. The $690B capex commitment already baked in a Jevons assumption. If efficiency gains outpace demand expansion — even temporarily — the debt structures crack before the demand side catches up.

I. Google TurboQuant — What It Actually Does

The Breakthrough

TurboQuant was published at ICLR 2026 on March 24–25, 2026. The paper targets a specific and well-known bottleneck in AI inference: the KV cache (key-value cache). Every time an LLM generates tokens, it must store and retrieve attention context for all prior tokens in the sequence. As context windows have grown from 4,096 tokens to 200,000 tokens and beyond, KV caches have ballooned from megabytes to hundreds of gigabytes per concurrent session. This is why inference at scale is memory-bound, not compute-bound — the GPU sits idle waiting for memory reads.

TurboQuant compresses the KV cache to 3 bits (down from 32 bits in standard float32 representations) with zero accuracy loss on all major long-context benchmarks. At 4-bit compression it achieves up to an 8x speedup in computing attention logits on Nvidia H100 GPUs. The algorithm works in two stages:

No retraining. No fine-tuning. Works on existing models including Gemma and Mistral families. Open-source PyTorch implementations appeared on GitHub within 48 hours of the paper's publication.

What TurboQuant Does Not Do

Morgan Stanley's "buy the dip" call and TrendForce's "overreaction" assessment are defensible on one specific technical claim: TurboQuant does not directly reduce HBM demand. This distinction is critical and worth dwelling on.

There are two separate memory systems in a modern AI data center:

Memory Type Location Function TurboQuant Impact
HBM (High Bandwidth Memory) Inside the GPU die package (stacked on chip) Stores model weights; feeds compute cores during forward pass None — not targeted
DRAM / Server RAM On the inference server motherboard, outside GPU Stores KV cache during extended conversations; feeds data to GPU Direct 6x compression

HBM is what SK Hynix, Micron, and Samsung are supplying to Nvidia for H100, H200, and Blackwell GPUs. TurboQuant compresses the data that lives in server DRAM, not in HBM. So the immediate impact on the HBM supply chain is limited. Morgan Stanley is technically correct.

But this is a narrow argument deployed in defense of a broad position. The question is not "does TurboQuant hurt HBM this quarter?" The question is: "what does the compounding trajectory of efficiency improvements like TurboQuant imply for the $690B of infrastructure investment being made this year?" That is a different and more dangerous question.

The Right Frame

TurboQuant is not a torpedo aimed at HBM. It is a data point confirming a structural trend: the software layer is relentlessly closing the gap with the hardware layer's appetite. Every efficiency gain of this type extends the useful life of existing hardware, reduces the urgency of the next upgrade cycle, and narrows the revenue window for infrastructure built on today's assumptions. The market's initial reaction was wrong about the target. Its instinct about the direction was correct.

The Context: A Cascade of Memory/Compute Efficiency Breakthroughs

TurboQuant does not exist in isolation. It is the latest in a sequence of algorithmic breakthroughs that are collectively compressing the hardware cost per unit of AI capability. The trend is not new — but its pace is accelerating.

Breakthrough Date Efficiency Gain Target
DeepSeek V3 (MoE + MLA) January 2025 ~90% reduction in reasoning cost Training + inference compute
Apple "LLM in a Flash" Ongoing (arxiv 2023, deployed 2024–25) Run 2x DRAM-size models on flash; 20–25x CPU speedup Inference on-device memory
IBM Granite 4 (Mamba-2 Hybrid) November 2025 Up to 8x faster token generation vs. equivalent transformer Inference compute + memory
Nvidia Nemotron 3 Super (MoE) Early 2026 7.5x higher throughput; only 12.7B of 120.6B params active per pass Inference compute
Groq 3 LPU (Nvidia, post-acquisition) March 16, 2026 (GTC) Deterministic streaming; no external HBM required Inference memory architecture
Google TurboQuant March 24–25, 2026 6x KV cache compression; 8x attention logit speedup Inference DRAM (KV cache)

Each row in this table represents a published, deployable, or production-track breakthrough. The cumulative effect across the stack — training compute, inference compute, inference memory, on-device memory — is an AI capability cost curve that is falling far faster than the infrastructure investment curve. The capex being committed in 2026 is being priced against hardware assumptions that are becoming obsolete before the concrete dries.


II. The Scale of the Infrastructure Investment

$690 Billion in a Single Year

The five largest US technology companies have committed a combined $660–690 billion in capital expenditure for 2026, approximately 75% of which (~$450–500 billion) is directed at AI infrastructure: data centers, chips, cooling, power, networking.

Company 2024 Capex 2025 Capex 2026 Capex (committed) YoY Growth
Amazon (AWS) ~$77B ~$125B ~$200B +60%
Alphabet / Google $52.5B ~$95B $175–185B +84–95%
Microsoft ~$55B ~$80B $120B+ +50%
Meta ~$38B ~$65B $115–135B +77–108%
Oracle ~$9B ~$15B ~$25–30B +67–100%
Total ~$231B ~$380B ~$635–670B +67–76%

For context: the entire US electric utility industry invested approximately $160 billion in generation, transmission, and distribution in 2024. The five largest technology companies are outspending the entire US utility sector on energy-adjacent AI infrastructure by a factor of four. The comparison is not rhetorical — power grid constraints are now the single most cited bottleneck in data center deployment.

The Debt Bridge — A Structural Warning

This capex is no longer being funded purely from free cash flow. Hyperscalers issued $108–182 billion in new debt in 2025 — roughly double the prior year. Cumulative projected debt issuance over the coming years is estimated at $1.5 trillion. The debt instruments have maturities of 7–15 years.

This creates a structural mismatch that is the core of the infrastructure bubble risk:

The Structural Mismatch

The effective economic life of current-generation AI chips (H100, H200, Blackwell) is approximately 12 months before a successor generation renders them competitively obsolete. Debt instruments financing data centers assume 7–15 year asset lives. Private equity investors funding AI infrastructure expect venture-style 100x returns. Creditors expect stable infrastructure returns. These three expectations — 1-year chip vintage, 10-year debt, 100x equity return — are simultaneously irreconcilable and baked into the same capital structure.

Ares Management Co-President Kipp deVeer stated this explicitly in October 2025: "If you look historically in areas like this over the past 20 or 30 years, typically when this much capacity comes online, some of it at the end of the day has to be marginal." Analyst Gil Luria was more direct: "If the market for artificial intelligence were even to steady in its growth, pretty quickly we will have over-built capacity, and the debt will be worthless, and the financial institutions will lose money."


III. The HBM Market — Shortage Today, Fragility Tomorrow

The Current State: An Unprecedented Seller's Market

The near-term HBM picture is unambiguously positive for suppliers. High Bandwidth Memory — the stacked DRAM packages inside Nvidia GPUs that store model weights and feed compute cores — is supply-constrained through at least 2026. All three major producers have effectively pre-sold their entire 2026 output.

Supplier HBM Market Share (Q2 2025) 2026 Outlook Key Metrics
SK Hynix 62% ~70% of HBM4 for Nvidia Rubin (UBS) 2026 revenue forecast +37.9% YoY; wafer shortage "could last until 2030"
Micron ~20% (ramping) 2026 HBM supply entirely pre-sold; HBM4 ramp in Q2 2026 Q2 FY2026 revenue $23.86B (+57% YoY); HBM TAM forecast $100B by 2028
Samsung ~18% Recovering position; warned of memory shortage driving prices up Collaborating on HBF standardization; advancing PIM-enabled LPDDR6

SK Hynix's statement that "memory wafer shortages could last until 2030" captures the near-term consensus. In this environment, Morgan Stanley's "buy the dip" reflex on TurboQuant news is understandable — the near-term supply/demand balance is too tight for algorithmic efficiency gains to make a dent in 2026 or even early 2027.

The 2027–2028 Exposure

The consensus view breaks down when the analysis extends beyond the current shortage cycle. Three forces converge in the 2027–2028 window that make the HBM market structurally vulnerable:

Force 1: All three suppliers are massively expanding capacity simultaneously

Micron raised capex to $20 billion specifically for HBM expansion. SK Hynix is investing into the mid-30% of revenue range. Samsung has committed its most aggressive memory capex since the DRAM wars of the early 2010s. When all three major players expand simultaneously into a market driven by a single customer ecosystem (Nvidia's GPU platform), overcapacity risk is not hypothetical — it is the base case for every prior memory super-cycle. DRAM spot prices dropped 50% from 2023–2025 in standard segments before HBM shielded producers from the correction. That shielding is not permanent.

Force 2: HBF (High Bandwidth Flash) is coming to market in 2027

SanDisk's High Bandwidth Flash integrates NAND flash with HBM packaging, offering up to 16x the capacity of HBM at comparable bandwidth and similar price points. First samples are targeted for H2 2026; first AI inference devices with HBF are expected in early 2027. SK Hynix, Samsung, and SanDisk are all collaborating on HBF standardization. This is not a fringe development — three of the four largest memory companies on earth are building toward a new memory format that, at 16x capacity advantage, could make the $35 billion HBM market look like a transitional technology.

Force 3: Architecture migration away from GPU+HBM is accelerating

The GPU+HBM stack is not the only architecture in the field. Three alternative approaches are in active development or production:

None of these alternatives currently threatens Nvidia's GPU dominance for training. But inference is a different workload — latency-sensitive, memory-bandwidth-bound, and increasingly the economic center of gravity as pre-training spend matures. If inference migrates toward LPU/wafer-scale/PNM architectures, the demand for stacked HBM packages shrinks with it.

The Irony

Nvidia's $20B acquisition of Groq — a chip designed to not need HBM — is arguably the most important single signal in this analysis. When the dominant GPU vendor spends $20 billion to acquire an inference architecture that bypasses its own memory ecosystem, it is effectively pricing in the probability that HBM-dependent inference has a finite runway. The market has not fully priced this implication.

IV. Data Center Overbuilding — The Power Wall

The Stranded Asset Problem Is Already Visible

Data center deals hit a record $61 billion in 2025. But the constraint is no longer capital — it is power. Seventy-two percent of data center operators identify power and grid capacity as their most severe operational challenge. The consequence is already materializing: €5.8 billion ($6.8 billion) in Irish data center projects are stranded — land acquired, planning permission secured, construction permits granted, but no grid connection available. These are not hypothetical future assets. They are real capital deployed into unmonetizable real estate.

The Power Consumption Trajectory

U.S. data center electricity consumption is projected to reach 300 TWh by 2028, roughly doubling from current levels. AI data centers are forecast to consume 9% of all U.S. electricity by 2030. Current load is approximately 41 GW, growing 15–20% annually. The U.S. electric grid — much of which was built 30–50 years ago — was not designed for this growth rate. New grid capacity requires 5–10 year permitting and construction timelines. The AI buildout cycle is running 100x faster.

Metric 2024 2026 (current) 2028 (projected) 2030 (projected)
US data center power load ~25 GW ~41 GW ~55 GW ~75 GW
US data center TWh/year ~150 TWh ~200 TWh ~300 TWh ~400+ TWh
As % of US electricity generation ~3.7% ~5% ~7% ~9%

The power constraint creates a category of infrastructure risk that is entirely independent of AI capability progress: even if AI application adoption grows exactly as projected, and even if hardware efficiency stays constant, the physical power infrastructure to host the built-out data center capacity may simply not exist where it is needed. The new industry performance metric — "tokens per watt per dollar" — is a direct acknowledgment of this reality. The previous metric (raw FLOPS or raw GPU count) no longer captures the binding constraint.

The Cooling Premium

Modern AI chips generate heat densities that standard air-cooled data centers cannot handle. Retrofitting existing air-cooled facilities for liquid cooling costs 7–10% more than building new liquid-cooled facilities from scratch. The legacy data center stock built during the 2015–2022 cloud buildout — worth hundreds of billions — faces either expensive retrofit or functional obsolescence as AI compute density requirements grow. This is a hidden write-down embedded in infrastructure balance sheets that has not yet been marked to market.


V. The Jevons Paradox — The Bull Case's Strongest Argument

The Paradox and Its Evidence

The Jevons Paradox, first articulated by economist William Stanley Jevons in 1865 regarding coal consumption, holds that when the efficiency of a resource's use improves, total consumption of that resource rises rather than falls — because lower cost per unit enables new applications that did not previously exist at higher prices. Applied to AI infrastructure: every time compute becomes cheaper, demand for AI services expands faster than the efficiency gain, and total hardware demand grows.

The empirical evidence for Jevons in AI is genuinely strong. In January 2025, DeepSeek V3 caused Nvidia to fall 17% in a single session. The market assumed cheaper AI meant less GPU demand. Within three months, AI inference demand had grown so rapidly that the net effect on GPU order books was positive. Inference costs dropped roughly 90% post-DeepSeek; total AI usage and GPU demand grew. H100 cloud instance prices declined 64–75% from Q4 2024 to Q1 2026 — yet Nvidia's order book reached $1 trillion at GTC 2026. This is Jevons operating precisely as the theory predicts.

Morgan Stanley applies the same argument to TurboQuant: if inference memory costs fall 6x, more AI services become economically viable, and total memory demand grows rather than falls. KKR argues AI infrastructure demand is durable and real. Microsoft's data center executive said he is "more worried we are underbuilding than overbuilding."

Where Jevons Has Limits

The Jevons argument is compelling — but it contains an assumption that is not examined carefully enough in the current AI infrastructure debate: Jevons requires that marginal demand can grow fast enough to absorb efficiency gains before the financial structures financing the infrastructure mature.

This is where the infrastructure bubble risk is specific and different from a simple overbuilding story. Consider the chain of logic:

  1. $690B in 2026 capex is committed and largely deployed or under contract.
  2. Much of this capex is financed by debt with 7–15 year maturities.
  3. The debt was underwritten assuming a certain level of utilization revenue — which in turn assumed hardware capability requirements would remain roughly stable enough to keep existing chips economically relevant.
  4. Deflationary efficiency gains (TurboQuant, Mamba/SSM, HBF, PNM) mean that the same AI workload can run on progressively less hardware — or on hardware that is different in architecture from what was financed.
  5. If the demand expansion from Jevons happens on a 2–3 year cycle, but the financial structures crack on a 12–18 month cycle (because chip obsolescence is faster than the refinancing window), the Jevons effect arrives too late to save the debt.

Sam Altman — not a habitually bearish voice on AI — said in August 2025: "Are we in a phase where investors as a whole are overexcited about AI? My opinion is yes." This statement was made by the CEO of the company that is the most AI-optimistic large institution on earth. Its significance has been underweighted by the market.


VI. The Scaling Law Question — Is the Compute Hunger Permanent?

The "More Compute = Better AI" Assumption

The entire infrastructure investment thesis rests on a scaling law assumption: that larger models trained on more data with more compute produce systematically better capabilities. This assumption was empirically validated from approximately 2018 through 2024 and drove Nvidia's extraordinary revenue growth. The question for 2025–2030 is whether the assumption continues to hold — and the evidence is genuinely mixed.

Evidence that scaling is maturing

Evidence that scaling continues on new axes

The Three-Axis Scaling Framework (2026)

The current consensus is that three separate scaling dimensions are operating simultaneously, with different cost/benefit profiles:

Scaling Axis Status Infrastructure Implication
Pre-training scale (larger models + more data) Maturing; diminishing returns at frontier Massive but potentially peaking training cluster demand
Post-training / RLHF / fine-tuning Active; primary capability driver at frontier Continuous but smaller-scale GPU demand; different hardware profile
Test-time compute (inference-time reasoning) Emerging; not yet saturating High inference compute demand — but with memory efficiency (TurboQuant) as the key constraint

The significance: if pre-training scaling is maturing, the marginal return on the massive training cluster buildout embedded in hyperscaler capex is declining. The demand growth is migrating toward inference — which is exactly the workload being targeted by TurboQuant, Mamba, Groq LPU, and Cerebras. The infrastructure built for pre-training (massive HBM-dense GPU clusters optimized for matrix multiplication throughput) is the wrong profile for the workload that is growing fastest.


VII. The Two-Bubble Map — Application vs. Infrastructure

This report, read alongside its companion (The Anti-Bubble Thesis), maps a specific thesis: the application and infrastructure layers of the AI economy have different bubble risk profiles, different time horizons, and different burst mechanisms.

Dimension AI Application Layer AI Infrastructure Layer (This Report)
Is it a bubble? No — displacement cycle with real revenue Partially — structural fragility is real and growing
Revenue validation Anthropic $20B ARR, enterprise-contracted Hyperscaler AI revenue growing but far below $690B capex run-rate
Burst mechanism Would require sustained enterprise ROI failure Debt maturity mismatch + architecture migration + power grid constraints
Burst timeline Not imminent; 5+ year displacement cycle 2027–2028 window; keyed to debt refinancing cycles and HBM capacity
Key signal to watch Enterprise contract renewal rates; Claude/OpenAI enterprise churn H100/H200 utilization rates; HBM spot prices post-2026 shortage; hyperscaler AI revenue vs. capex ratio
Who wins if the layer deflates Application winners → Anthropic, Microsoft, Salesforce (via integration) Infrastructure → Efficient architecture providers (Groq/Nvidia LPU, Cerebras), power generators, cooling
Who loses Enterprise SaaS companies being displaced HBM suppliers in overcapacity; data center REITs with stranded assets; debt investors in AI infra vehicles

VIII. Investment Implications

PRZC Research does not provide individual security recommendations in this format. The following represents sector-level framing for institutional consideration.

The near-term memory trade (2026)

Morgan Stanley and TrendForce are likely correct that the initial TurboQuant selloff overshot. HBM supply is sold out through 2026. SK Hynix and Micron have the most favourable near-term supply/demand positioning in memory they have had in a decade. The 5–6% single-session drop in memory stocks on the TurboQuant announcement represents an overreaction to a paper that targets DRAM, not HBM, in a market where HBM is presold. Near-term: the dip is defensible to buy on fundamentals.

The medium-term memory trade (2027–2028)

The consensus bullish case for memory through 2028 assumes the GPU+HBM architecture remains the dominant inference stack. This assumption faces three specific challenges that arrive in the 2026–2027 window: HBF samples reaching market (H2 2026), Groq 3 LPU scaling in Nvidia's own product line, and PNM/PIM-enabled LPDDR6 reaching mass production. If any two of these three materialize at commercial scale, the HBM market enters overcapacity before its TAM expansion thesis completes. The $35B → $100B HBM TAM forecast through 2028 is the right number if architecture stays constant. It is a generous number if architecture migrates.

Hyperscaler capex sustainability

The $690B 2026 capex commitment is real and locked in. But the sustainability of the 2027–2028 trajectory depends on AI cloud revenue growing into the capex commitment faster than debt maturities come due. At current AI cloud revenue growth rates, the gap is closing — but slowly. The ratio to watch is hyperscaler AI-attributed cloud revenue as a percentage of AI-directed capex. When this ratio crosses 30%, the model is self-sustaining. The current estimate for 2026 puts it at roughly 15–20%. The debt structures begin to stress in the 2027–2028 window if that ratio does not improve substantially.

The efficient-architecture trade

The structural beneficiaries of a shift away from HBM-dense GPU stacks are the providers of efficient inference architectures: Nvidia (which already owns both sides of the bet via Groq acquisition), Cerebras, and the companies building PNM/HBF-native inference infrastructure. Power generation and grid infrastructure companies benefit regardless of which compute architecture wins — the power demand is inelastic to chip brand. Nuclear, grid-scale storage, and transmission infrastructure are the cleanest infrastructure plays in the AI economy.

Data center real estate

Data center REITs and private credit vehicles with long-duration exposure to AI infrastructure carry the highest concentration of the structural mismatch risk. The assets are real; the revenue is contractually committed for 5–10 year leases; but the risk is mark-to-model on the terminal value of the physical plant. If the AI workloads at maturity run on architectures that require 1/6th the physical footprint of today's GPU clusters (combining TurboQuant-type efficiency gains with Mamba/SSM throughput improvements), the terminal utilization rate of 2026-vintage data centers is materially lower than underwriting models assume.


Conclusion: Two Bubbles, One Economy

The AI bubble debate has been conducted as if "AI" is a single thing with a single risk profile. It is not. The application layer — where Anthropic, OpenAI, and the enterprise software transformation are happening — has the characteristics of a genuine platform shift: real revenue, real enterprise contracts, real productivity gains in measured use cases, real market displacement. The bubble narrative fails at the application layer.

The infrastructure layer operates under different physics. $690 billion per year in capex, financed by $182 billion in new annual debt, deployed into hardware with a 12-month economic life, built on power grids that cannot sustain the trajectory, and optimized for a GPU+HBM architecture that three separate technological vectors are actively migrating away from — this is not the same story. This is a structural fragility story with a 2027–2028 inflection point.

Google TurboQuant, by itself, changes nothing. As a data point in a trend, it changes the probability distribution of the infrastructure fragility thesis. The market's initial reaction to the paper — sharp drops in memory stocks followed by "overreaction" calls from sell-side analysts — captures both the signal and the noise simultaneously. The signal is real. The noise was in the timing and the specific target.

The correct investor posture is not to choose between "AI is a bubble" and "AI is transformative." Both can be true in different parts of the stack at the same time. Anthropic replacing the NASDAQ 100 is a story about the application layer. The AI infrastructure reckoning is a story about what happens when $690 billion in annual physical capital formation meets a technology efficiency curve that is outrunning the financial structures built to monetize it.

"Every major technology platform shift in history has featured a period in which the infrastructure buildout runs ahead of monetization — followed by a reckoning that destroys infrastructure capital while preserving application capital. The railroad land grants funded stranded iron in the 1870s. The fiber optic overbuild of the late 1990s gave us Google. The AI infrastructure overbuild of 2025–2026 will give us something. It will not give everyone their money back."
— PRZC Research, March 2026

Appendix: Key Data Reference

Metric Value Date / Source
Google TurboQuant KV cache compression 6x (3-bit, zero accuracy loss) Google Research / ICLR 2026, March 2026
TurboQuant H100 attention logit speedup Up to 8x at 4-bit Google Research, March 2026
Memory stock reaction to TurboQuant SK Hynix -6%, Samsung -5%, Micron -5%, Kioxia -6% CNBC, March 26, 2026
Hyperscaler combined capex 2026 $660–690 billion (75% AI-directed) IEEE ComSoc / Futurum, 2026
Hyperscaler debt issuance 2025 $108–182 billion Multiple, 2025
Nvidia acquisition of Groq $20 billion December 24, 2025
Groq 3 LPU on-chip SRAM bandwidth 80 TB/s; 230MB on-chip; no external HBM Nvidia GTC 2026, March 16, 2026
Cerebras WSE-3 on-wafer SRAM 44GB at 220+ TB/s; no external memory Cerebras, 2025
SanDisk HBF capacity vs. HBM 8–16x higher at comparable bandwidth SanDisk, 2025
SanDisk HBF first AI inference devices Early 2027 SanDisk roadmap, 2025
SK Hynix HBM market share (Q2 2025) 62%; ~70% of HBM4 (Rubin) per UBS Counterpoint Research / UBS, 2025
HBM TAM 2025 → 2028 forecast $35B → $100B at ~40% CAGR Micron, December 2025
Micron Q2 FY2026 revenue $23.86B (+57% YoY) Micron earnings, March 2026
Irish stranded data center assets €5.8 billion Enlit World, 2025
US AI data centers as % of US power by 2030 9% Tech-Insider, 2026
US data center power consumption projection 300 TWh by 2028; current 41 GW load HPCwire / BigDATAwire, 2025
DeepSeek V3 training cost ~$5.6M on 2,048 GPUs Multiple, January 2025
AI reasoning cost decline post-DeepSeek ~90% Multiple, Q1 2025
H100 cloud instance price decline 64–75% (Q4 2024 → Q1 2026) byteiota, 2026
Nvidia GTC 2026 order book forecast $1 trillion cumulative through 2027 Nvidia GTC, March 2026
IBM Granite 4 (Mamba-2 Hybrid) inference speedup Up to 8x vs. equivalent transformer InfoQ / IBM, November 2025
Nvidia Nemotron 3 Super active params per pass 12.7B of 120.6B total; 7.5x throughput gain Nvidia, 2026
Sam Altman on AI overexcitement "My opinion is yes" August 2025

Disclaimer: This report is produced by PRZC Research for informational and analytical purposes only. It does not constitute investment advice, a solicitation, or a recommendation to buy or sell any security. All figures cited are attributed to third-party sources and have not been independently verified by PRZC Research. Past market reactions are not predictive of future outcomes. Readers should conduct their own due diligence before making any investment decision.

Want research like this on your own topic? Bespoke reports from £500.

Commission a Report