The Neurotypical Ceiling: Why AGI Cannot Be Reached Through Consensus-Convergent Training

March 29, 2026 · PRZC Research

TABLE OF CONTENTS

The Hypothesis in One Paragraph
The Neurotypical Performance Ceiling — Where We Are Now
The Genius Distribution Problem
Why the Transformer Architecture Is Neurotypical by Design
The Phase Transition at 145+ IQ — Why Scaling Cannot Cross It
The Implication: AGI Requires a Different Architecture
What This Means for the AGI Timeline Debate
The Investable Signal
Honest Limits and Falsifiability
Conclusion: The Ceiling Is Not in the Compute Budget

The Hypothesis in One Paragraph

AGI — defined as artificial intelligence matching or exceeding the full range of human cognitive capability — is structurally unreachable through training methodologies that converge on neurotypical consensus. Frontier models have effectively saturated the neurotypical performance band: they match or exceed the median educated human on every consensus-measurable task. But the highest-performing human minds are not smarter neurotypicals. They are structurally different processors. Scaling the current training paradigm produces marginal neurotypical gains — the 120→125 IQ improvement — not the phase transition that separates expert performance from paradigm-breaking cognition. The ceiling is architectural, not computational. Every AGI timeline debate asks the wrong question. Not when, but how — and the answer, under current architectures, is: not by this path.

Report Structure

This report presents eight sequential arguments constituting a structural hypothesis. Sections I–IV establish the empirical and theoretical case. Sections V–VI develop the architectural and strategic implications. Section VII addresses the investable signal. Section VIII states the honest limits and falsifiability conditions. Readers seeking the core argument should read Sections I, IV, and V in sequence before the others.

I. The Neurotypical Performance Ceiling — Where We Are Now

Benchmark Saturation at the Frontier

The empirical picture entering 2026 is unambiguous: frontier AI models have saturated every benchmark designed to measure human expert performance on consensus-measurable cognitive tasks. MMLU — the Massive Multitask Language Understanding benchmark, covering 57 academic domains from law to medicine to philosophy — now sees frontier models clustered above 88%, with meaningful differentiation between models becoming impossible at that ceiling. The benchmark was originally designed to challenge AI systems; it now primarily functions as a confirmation of saturation rather than a discriminative evaluation.

HumanEval, the standard coding benchmark comprising 164 Python programming problems, sees frontier models at or above 90% — a threshold that represents effective saturation given the benchmark's limited sample size and well-documented training contamination. MATH-500, measuring mathematical reasoning, records GPT-5-class models at 96%. On the USMLE medical licensing examination, GPT-4o-class models scored between 85.7% and 92.5% across all three steps. On the bar examination, frontier models consistently pass at rates exceeding the human first-attempt pass rate.

These are not narrow technical achievements. They represent AI performance at or above the level of skilled human professionals across an entire tier of cognitive work: legal reasoning, medical diagnosis, advanced mathematics, scientific knowledge, complex coding. The performance band they occupy corresponds to approximately the 90th–95th percentile of educated human performance on structured, consensus-measurable tasks.

The New Benchmarks Reveal the True Ceiling

The research community's response to saturation has been instructive: the construction of progressively harder benchmarks designed to find where AI capability runs out. What these new benchmarks reveal is not that AI is approaching human performance — it is that beyond the neurotypical performance band, AI performance collapses abruptly.

Benchmark	Domain	Top Frontier Score (2025–2026)	Human Reference	Status
MMLU	Academic knowledge (57 domains)	~91% (GPT-5 class)	~89.8% expert human	Saturated
HumanEval	Python coding	>90%	~97% professional developers	Near-saturated
MATH-500	Competition mathematics	~96%	~90% graduate level	Saturated
SWE-bench Verified	Real-world GitHub issues	~77% (Claude Sonnet 4.5)	—	Approaching saturation
ARC-AGI-2	Novel visual reasoning (2025)	~37–84% (varies by method/cost)	100% untrained humans	Significant gap
ARC-AGI-3	Reasoning efficiency vs. humans	<1% (all frontier models)	100% untrained humans	Collapsed
FrontierMath	Novel research-level mathematics	~2%	—	Collapsed
Humanity's Last Exam	Cross-domain expert knowledge	~8.8% (top system)	—	Collapsed

The pattern is striking. On tasks that measure the application of existing knowledge within established frameworks — the cognitive work that defines professional expertise — AI performance has converged with or exceeded skilled human performance. On tasks that require reasoning about genuinely novel patterns absent from training data, or that require the kind of fluid intelligence that solves problems never seen before, AI performance does not merely decline — it collapses to near zero.

ARC-AGI-3, released in early 2026, is the most diagnostic: it presents visual reasoning tasks solvable by untrained humans (achieving 100%) but at which every frontier model scores below 1%. The designers explicitly tested not just whether AI can solve the task, but whether it can do so at anything approaching human computational efficiency. The gap is not incremental. It is categorical.

Scaling Produces the 120→125 Improvement, Not the 145→180 Leap

The computational response to benchmark saturation has been to apply more inference-time compute: o-series reasoning models, extended chain-of-thought, repeated sampling, and verification loops. These approaches produce real gains on structured reasoning tasks. They represent a genuine new axis of scaling beyond training-time compute. But their gains are concentrated precisely in the domain where AI was already approaching saturation — structured, verifiable, consensus-measurable problem-solving — and they do not transfer to the categories where AI already collapses.

Anthropic's internal roadmap iterations — Claude Sonnet 4.6, the leaked Mythos and Capybara frameworks — represent continued scaling of the same paradigm: more parameters, more compute, better alignment. Our analysis characterises these as compute scaling rather than cognitive architecture innovation. They produce the cognitive equivalent of the 120→125 IQ improvement: real, measurable, but operating entirely within the existing cognitive mode. They do not represent progress toward the phase transition that separates expert performance from paradigm-generating cognition.

The Saturation Argument

Frontier AI performance on consensus-measurable benchmarks has converged with skilled human performance. New harder benchmarks reveal a categorical gap: AI scores 0–9% on tasks requiring genuine novelty; humans score 100%. Further scaling of the current paradigm improves performance within the saturation band, not across the categorical gap. The ceiling is being confirmed, not approached from below.

II. The Genius Distribution Problem

Genius Is Not More Neurotypical Processing

The cognitive distribution of humanity is not a simple bell curve of general intelligence. It is a distribution where the extreme tails exhibit qualitatively different cognitive modes, not merely more of the same cognitive capacity. The highest-measured human IQs do not belong to people who are simply better at what neurotypical minds do. They belong disproportionately to people whose cognitive architecture is structurally different — whose minds process information through mechanisms that deviate from consensus patterns at the process level, not merely the output level.

This is the genius distribution problem for AGI: if the highest levels of human cognitive performance are achieved not by maximising neurotypical cognition but by employing fundamentally different cognitive processes, then training methodologies designed to approximate neurotypical consensus cannot reach those performance levels by construction. Scaling consensus-convergent training does not approach neurodivergent cognitive modes — it moves away from them, more precisely and at greater speed.

The Historical Record

Retrospective diagnosis is imprecise and PRZC Research treats it accordingly. Nonetheless, the cognitive profiles of the individuals who produced the largest documented leaps in human intellectual history are instructive as structural evidence rather than clinical fact.

Srinivasa Ramanujan produced over 3,000 theorems and equations in near-complete isolation from the formal mathematical tradition, without access to the standard literature, and by methods he could not always explain. His results arrived, by his own account, through non-linear intuition rather than sequential proof construction. They were not refinements of existing mathematical work. They were structural innovations that the existing framework could not have generated by elaboration. The cognitive profile — extreme isolation from consensus, spatial-intuitive rather than sequential-formal reasoning, results that outran the available explanatory apparatus — is categorically different from expert neurotypical mathematical performance.

Alan Turing invented a new conceptual category by refusing to accept the consensus view of what mathematics was for. His defining contribution was not the application of an existing framework but the invention of a new one: computability theory, derived from taking Gödel's incompleteness theorems to a logical conclusion that the mathematical consensus had not pursued because it seemed practically irrelevant. His cognitive characteristics — extreme systemising, social disconnection, literal interpretation, relentless attention to foundational anomalies — are now widely recognised as consistent with autistic profiles. The contribution was not a better solution to an existing problem. It was the invention of a new problem-space.

Nikola Tesla reported visual thinking so vivid it was functionally indistinguishable from perception: he could design, test, and modify complete electrical systems in his mind before building them. This whole-object spatial cognition — associated with dyslexic and spatially-dominant cognitive profiles — produced reconceptualisations of electrical engineering that were not improvements on existing methods. They required discarding the existing framework entirely and building from different principles.

Alexander Grothendieck transformed the foundations of algebraic geometry not by solving the problems within the existing framework but by systematically dismantling the framework and rebuilding it at a higher level of abstraction. His characteristic move was to make the problem more general until it became trivial — the exact inverse of the neurotypical problem-solving heuristic of constraint and simplification. His documented cognitive and social profile is consistent with autistic features.

John von Neumann demonstrated a processing style characterised by immediate pattern recognition across domains at speeds that contemporaries described as qualitatively different from expert performance rather than quantitatively faster. Norbert Wiener observed that von Neumann's mind worked "so fast that you felt you were talking to a different species." The specific signature — instantaneous cross-domain transfer, disregard for the sequential elaboration of proofs, compulsive ranging across fields — is consistent with a cognitive architecture that does not respect domain boundaries the way neurotypical expertise does.

The Forbes Data

The historical record is supported by contemporary empirical data at the population level. A 2025 Forbes analysis found that 45% of C-level executives and 55% of business owners self-identify as neurodivergent — rates three to four times population prevalence. This is consistent with what the entrepreneurship research literature has long documented under different labels: the cognitive profile that creates new categories differs structurally from the profile that manages within them.

Business category creation — identifying a market before it exists, building a company around an insight that the consensus dismisses, persisting in a direction the available evidence does not yet support — requires precisely the cognitive moves that AI training cannot make: reasoning from an absence rather than a presence in the training data, treating the current consensus as incomplete rather than authoritative, discarding existing frameworks in favour of ones that do not yet exist. The Forbes data is not merely sociologically interesting. It is structural evidence that the highest-value cognitive work in commercial contexts is disproportionately performed by cognitive profiles that lie outside the training distribution.

The Core Distinction

Genius is not more processing power on the same algorithm. It is a different algorithm. Current AI scaling is more FLOPS on the same algorithm. More FLOPS on a neurotypical algorithm produces better neurotypical performance. It does not produce Ramanujan. It does not produce Turing. It does not produce the next paradigm break. That path has a ceiling — and the evidence suggests we are near it.

Commission Research

Want research like this on your own topic?

Commission a bespoke PRZC report on any company, sector, or market question — delivered within 5–7 business days. Or convene a five-analyst board meeting on your decision. 1 Free Report & 2 Free Board Meetings on sign-up — no card required.

Commission a Report Try The Board →

III. Why the Transformer Architecture Is Neurotypical by Design

Attention as Formalised Relevance Filtering

The transformer architecture — which underlies every major frontier model including Claude, GPT, and Gemini — is built around the self-attention mechanism. Attention operates by assigning weights to tokens based on their contextual relevance to each other: the model learns, through training, which elements of a context are worth attending to when generating each output token. This is a formalisation of relevance filtering: the computational equivalent of the neurotypical cognitive process of deciding what matters and discarding what does not.

Neurotypical cognition applies relevance filters aggressively and efficiently. When solving a problem, the neurotypical mind identifies the established solution space, discards peripheral associations, and converges on the expected answer. This is useful for the vast majority of tasks — and it is precisely what attention weights, learned through RLHF on neurotypical annotator preferences, are trained to do. The attention mechanism is not a neutral computational tool. It is a learnable relevance filter whose values are calibrated to the cognitive preferences of the people whose data trained it — a predominantly neurotypical population.

RLHF as Consensus-Convergence by Construction

Reinforcement Learning from Human Feedback is the dominant post-training alignment methodology. Its structural property is well-established in the academic literature and has received formal analysis: RLHF optimises toward the center of the human preference distribution. A 2024 paper published in arXiv identified the mechanism precisely: RLHF's Kullback–Leibler-based regularisation introduces an inherent algorithmic bias that in extreme cases produces "preference collapse" — a state where minority preferences are virtually disregarded in favour of majority consensus. The reference model, biased toward pre-training consensus, passes that bias through the regularisation process, potentially amplifying preference imbalances rather than correcting them.

The practical consequence: a response that makes an unexpected lateral associative leap, questions the framing of a problem rather than solving within it, or arrives at a correct answer via a process that violates the rater's expectation of what a correct process looks like — all of which are ND cognitive signatures — will score poorly in RLHF. Not because the output is wrong, but because it does not conform to the annotator's consensus model of what a right answer looks like. The model is being trained away from these outputs at every RLHF step.

Research further shows that most state-of-the-art preference-tuned models achieve ranking accuracy below 60% on common preference datasets. The alignment process is not reliably capturing even the variance that exists within the neurotypical annotator pool — it is converging toward a narrower consensus than the population it samples from.

Constitutional AI as Normative Consensus Encoding

Anthropic's Constitutional AI methodology — the framework underlying Claude — trains models to critique and revise their own outputs according to a set of principles derived from consensus normative documents: the UN Declaration of Human Rights, widely-shared ethical guidelines, established professional norms. These are not arbitrary selections. They are the most consensus-tested normative documents available: the product of extensive negotiation, cross-cultural agreement processes, and institutional legitimation. They are, structurally, among the most neurotypical documents in existence.

Constitutional AI does not train models toward independent moral reasoning. It trains them toward conformity with normative consensus. Neurodivergent moral reasoning — which can exhibit radically different weighting of values, unusual rigidity about certain principles, unconventional priority orderings that violate social tact norms — is penalised not because it is inferior but because it does not match the normative consensus the constitution encodes. The training process systematically suppresses cognitive signatures that deviate from it.

The Training Corpus as a Neurotypical Archive

Pre-training data compounds the effect. Language models train on the cumulative written output of human civilisation: the internet, published books, academic papers, professional text. But this archive is not a representative sample of human cognition. It is a filtered, normalised subset that has passed through multiple layers of neurotypical gatekeeping:

Dyslexia directly impedes written production — the cognitive profile most associated with spatial, holistic, and non-linear thinking is the one least represented in written text
Autistic individuals are systematically underrepresented in academic publishing due to documented barriers in institutional navigation, social credentialing, and peer review networks
ADHD-associated writing patterns — non-linear, associative, digressive — are edited out in the production process; publishers and reviewers normalise text toward neurotypical structure before it enters the archive
Next-token prediction, the pre-training objective, maximises the likelihood of the next token given prior context — an objective that, by construction, rewards conformity to the statistical patterns of the corpus and penalises deviation from them
The filtering layers of publication — peer review, editorial selection, SEO optimisation, professional writing conventions — all function as normalisation mechanisms that smooth the corpus toward the cognitive center

The result is a training corpus that is not merely neurotypical in composition but is more neurotypical than the actual human population — because the production and curation mechanisms systematically filter ND cognitive signatures out before text reaches the pipeline. A model trained on this corpus to predict the next token is, at its most fundamental level, learning to approximate the cognitive patterns of a population more neurotypically homogeneous than humanity actually is.

Sequential Architecture vs. Simultaneous Processing

Even setting aside training data and alignment methodology, the transformer architecture has a deeper structural issue: it processes information sequentially. Even with parallelisation and extended context windows, the transformer constructs understanding through a linear pass over tokens. This is orthogonal to the holistic, simultaneous, spatial processing mode that characterises dyslexic cognition — the mode associated with holding complete systems as simultaneous objects, rotating them mentally, and seeing structural relationships that cannot be parsed sequentially.

This is not merely a training data problem. It is an architectural mismatch. The cognitive mode that produced Tesla's whole-system electrical reconceptualisations, that allows architects and surgeons to hold three-dimensional structures in working spatial memory, that underlies the ability to see how all parts of a system interact simultaneously rather than tracing them one by one — does not map onto sequential token prediction. A model trained on any amount of data cannot replicate a processing mode its architecture cannot implement.

IV. The Phase Transition at 145+ IQ — Why Scaling Cannot Cross It

Quantitative vs. Qualitative Differences in Cognitive Mode

IQ measurement is an imperfect proxy — this analysis acknowledges that explicitly and returns to it in Section VIII. But as a rough calibration device, it captures a distinction that is analytically essential. The cognitive difference between an IQ of 100 and an IQ of 145 is quantitative: the 145-IQ mind does more of what all minds do — faster processing, larger working memory, more effective pattern-matching, better retention. These are differences of degree within a shared cognitive mode.

The cognitive difference between an IQ of 145 and the minds that generate paradigm breaks — the Ramanujans, the Turings, the Grothendiecks — is not well-captured by further units on the same scale. These minds do not process more efficiently. They process differently. Specific cognitive modes activate that are not present, even in rudimentary form, in neurotypical cognition at any IQ level:

Associative Over-Inclusivity

ADHD-associated cognition frequently bypasses standard relevance filters. What presents as distractibility is, in its generative mode, associative over-inclusivity: the persistent activation of connections that the neurotypical filter discards as irrelevant. In most contexts this is a liability. In the specific context of finding non-obvious solutions — cross-domain analogies, connections between fields separated by disciplinary boundaries, the lateral leap that reframes a problem — it is a structural advantage. The ADHD hyperfocus state is this mechanism in its most productive form: the off-topic connection that turns out to be the essential one. AI language models are extraordinarily good at finding the most statistically likely connection between concepts. They cannot find the least statistically likely connection that happens to be correct — because that connection, by training objective, appears too rarely to reinforce.

Systemising Over Framework Application

Simon Baron-Cohen's systemising theory of autism identifies a cognitive orientation toward understanding underlying rules and mechanisms rather than applying established frameworks. Autistic cognition is bottom-up rather than top-down: it resists prior frameworks and is compelled by anomalies that do not fit the expected pattern. AI models are sophisticated pattern-matchers trained on what patterns exist. They apply known frameworks with great facility and question whether the framework itself is the problem with none. Autistic cognitive fixation on edge cases and framework-violations is precisely the move that generates paradigm shifts — and precisely what RLHF trains away from, because annotators penalise responses that seem to miss the obvious framework in favour of an elaborate alternative.

Spatial Whole-Object Cognition

Dyslexic cognition exhibits relatively greater strength in spatial, holistic, and three-dimensional reasoning compared to sequential processing. The dyslexic mind tends to hold entire systems as simultaneous objects rather than as sequential logic chains. This is why dyslexic individuals are dramatically overrepresented among architects, surgeons, and engineers in specialisations requiring spatial system-holding. The transformer processes sequentially by design. This processing mode cannot be approximated by training on descriptions of spatial cognition — the architecture itself cannot execute the simultaneous whole-object representation that dyslexic spatial cognition involves.

The Self-Renewing Paradox of Training Toward Divergence

A natural response: why not train AI specifically on the outputs of neurodivergent cognition? Curate a corpus of ND-origin breakthroughs; fine-tune toward those cognitive patterns; build the divergent thinker into the model. This approach fails at a structural level that has nothing to do with data availability.

The value of neurodivergent cognitive output is not located in a set of patterns that can be extracted and replicated. It is located in the process of departing from the current consensus — and that process, by definition, cannot be captured by training on prior departures. Each departure is a departure from the current state of knowledge, not from the state of knowledge at the time the training data was collected. The paradigm-breaking insight that matters is the one that has not yet been recognised as an insight. The neurodivergent mind trips over it precisely because it does not have the consensus-filter that allows a trained model to walk past it.

A model trained on every anomaly-fixation that produced a paradigm shift in the 20th century would learn to produce outputs resembling anomaly-fixation. But it would be applying that pattern to known anomalies — which are, by the time they appear in any training corpus, already resolved. The unresolved anomaly that matters is the one that has not yet been recognised as anomalous. Training on past anomalies tells the model where the resolved ones were. It tells it nothing about where the unresolved ones are now.

The Self-Renewing Paradox

If you successfully trained an AI to produce outputs characteristic of neurodivergent cognition, you would have defined that cognitive mode as the new consensus target. It would become the new neurotypical — the new center of the training distribution. New neurodivergent cognition would depart from the new consensus, and the gap would reopen. The value of neurodivergent cognition is not its content. It is its structural position relative to the consensus distribution. That position is self-renewing: every time the consensus shifts toward ND patterns, new ND cognition departs from the new consensus.

V. The Implication: AGI Requires a Different Architecture

The Current Race Is to a Local Maximum

The entire current AGI development trajectory — scaling parameters, scaling compute, scaling RLHF annotation, extending context windows, adding reasoning chains — is a race to a local maximum in cognitive performance space. That local maximum is the ceiling of neurotypical consensus cognition: the performance level of an extremely capable, extremely well-calibrated, extremely fast neurotypical mind. Frontier models are approaching that ceiling. Some benchmark scores suggest they have reached it.

The global maximum — what we are calling AGI — requires reaching the cognitive modes that lie beyond that ceiling. Those modes are not points further along the same dimension. They are points in a different direction entirely. A more capable consensus-convergent model is not closer to them. It is further from them, more precisely.

Alternative Architectural Approaches

Our analysis identifies five architectural directions that could, in principle, move toward rather than away from the cognitive modes described in Section IV. None is in commercial deployment. All face significant engineering and theoretical barriers. Several represent genuinely open research problems. PRZC Research presents them as leading indicators rather than forecasts.

1. Stochastic and Sparse Attention

Standard transformer attention assigns weights that converge toward consensus relevance. Stochastic attention mechanisms — which introduce controlled randomness into attention weight assignment — could, in principle, mimic the associative over-inclusivity of ADHD cognition: activating connections that a consensus relevance filter would suppress. Sparse attention mechanisms already exist for computational efficiency reasons; architecturally repurposing sparsity to introduce cognitive diversity rather than computational savings is a conceptually adjacent move that no current system deploys for this purpose.

2. Adversarial Mixture-of-Experts with Non-Consensus Expert Calibration

Current mixture-of-experts (MoE) architectures — deployed by DeepSeek-V3, Llama 4, Mistral Large 3, and effectively all frontier models — use gating mechanisms that route tokens to the most statistically appropriate expert. The experts specialise in narrow patterns. The gating mechanism optimises for task performance measured by consensus metrics. An adversarial extension would include deliberately non-consensus experts: experts trained on data selected specifically for deviation from the consensus, with gating mechanisms that can route to these experts when the consensus expert ensemble produces low-confidence or low-novelty outputs. The bootstrapping problem for this architecture is acute — addressed below — but the structural concept maps directly onto cognitive heterogeneity.

3. Curiosity-Driven and Novelty-Seeking Reward Models

Jürgen Schmidhuber's formal theory of creativity, developed over three decades, provides the mathematical foundation for curiosity as an intrinsic reward signal. Schmidhuber's framework defines the curious agent as one that maximises reward for discovering learnable but previously unknown regularities — patterns that compress the world-model further but were not previously in the model. The reinforcement learner is rewarded not for task performance on known problems but for learning progress itself. This is formally distinct from RLHF: it optimises for the generation of novel structure rather than the replication of consensus preferences.

Karl Friston's active inference framework extends this in a complementary direction. Active inference agents minimise expected free energy by taking actions that reduce uncertainty about the world rather than just confirming existing beliefs. The agent is compelled toward epistemic action — curiosity-driven exploration of novel states — as a direct consequence of its optimisation objective. Friston, as Chief Scientist at VERSES AI, argues that active inference produces the calibration and curiosity-driven exploration essential for AGI. PRZC Research notes that neither Schmidhuber's curiosity-driven learning nor Friston's active inference is in commercial deployment as a primary training paradigm for frontier models — both remain research-stage. They are precisely the architectural directions this hypothesis predicts would be necessary.

4. Neurodivergent-Calibrated Evaluation Pipelines

A less architectural but potentially tractable intermediate step: restructuring RLHF annotation to include a deliberate over-representation of neurodivergent raters with cognitive profiles matched to the cognitive modes described in Section IV. This would not solve the paradigm problem — ND annotators rating known outputs cannot generate novel frameworks — but it could reduce the systematic suppression of ND-associated output characteristics in the alignment process. The barriers are practical: identifying raters whose ND profiles represent the generative tails of the distribution rather than the average ND experience, ensuring annotation instructions do not normalise their responses toward consensus, and scaling such a pipeline. No commercial lab currently deploys this.

5. Architectural Heterogeneity with Switchable Processing Modes

The deepest architectural option: a system that implements genuinely different processing modes — sequential token prediction for consensus cognition tasks, spatial whole-object representation for three-dimensional system reasoning, stochastic associative activation for lateral connection generation — with an inference-time routing mechanism that selects among them. This requires solving the representation problem for spatial and associative processing in neural architectures (both open problems) and the routing problem (which processing mode to engage, and when). It is the most ambitious approach and the furthest from current research. It also maps most precisely onto what the neuroscience of cognitive heterogeneity suggests is actually happening in the brains of neurodivergent high performers.

The Bootstrapping Problem

All of these approaches face a shared bootstrapping problem: who evaluates whether a divergent output is brilliant or noise? The consensus-convergent evaluation infrastructure that exists — human annotators, benchmark scores, preference models — is calibrated to identify good neurotypical outputs. It has no established method for identifying a genuine paradigm break before the paradigm has been validated. By definition, a paradigm break looks wrong by the standards of the paradigm it is breaking. Ramanujan's results looked like errors to the Cambridge mathematicians who first examined them. Turing's computability framework looked like an irrelevant mathematical curiosity to working engineers of the 1930s. The evaluation problem for non-consensus cognition is not merely technical. It is epistemological.

The first lab that solves the bootstrapping problem — the evaluation of non-consensus outputs on their merits rather than their conformity — accesses a problem space that the entire current industry, by the structure of its training methodology, cannot reach.

VI. What This Means for the AGI Timeline Debate

The Debate Is Asking the Wrong Question

The current AGI timeline debate clusters around a single question: when will a model achieve general intelligence? Sam Altman speaks of "a couple of thousand days." Demis Hassabis says three to five years. A 2024 survey of 2,778 AI researchers placed a 10% probability on AGI by 2027 and a 50% probability not until 2047. Tech entrepreneurs cluster around 2027–2028; academic researchers cluster around 2040.

This hypothesis argues that the entire debate is structured around the wrong question. The relevant question is not when but how — and the answer under current architectures is: not by this path. The debate assumes that scaling the current paradigm reaches general intelligence at some point on the timeline. Our structural argument is that it cannot, because the ceiling is architectural rather than computational. A more powerful consensus-convergent model reaches AGI in exactly the same way that a faster horse reaches New York from Los Angeles: it does not.

"AGI by 2027" Is Wrong in Kind, Not Just Date

"AGI by 2027" predictions are not merely optimistic date estimates for an achievable goal. They represent a categorical error about the nature of the path. They assume that AGI — matching the full range of human cognitive capability — is achievable by the current approach given sufficient compute and data. Our analysis says it is not, because the full range of human cognitive capability includes the cognitive modes that current architectures are constitutively unable to approximate.

If "AGI" is redefined as "AI that matches skilled human performance on all consensus-measurable tasks" — a meaningful but narrower definition — then 2027 may be in the right range: frontier models are already there on many such tasks, and the remaining gaps are narrowing. But this is not what the term means in its original sense, and conflating it with the original sense misleads everyone from investors pricing the capex cycle to policymakers designing regulatory frameworks.

"AGI Never" Is Also Wrong

The hypothesis is not pessimistic about the long-run solvability of AGI. It is pessimistic about the current path. The architectural alternatives described in Section V represent genuine research directions. The problem is solvable in principle. But "in principle solvable by a different approach" is a very different claim from "achievable by 2027 by scaling the current approach." The honest position is that AGI may be reachable but not via any architecture currently deployed or on any commercial roadmap currently published.

The Heavier-Than-Air Analogy

The Wright Brothers did not reach powered flight by building a faster horse. Every intermediate approach to aviation that attempted to leverage the existing paradigm — larger sails, more powerful engines for surface vehicles, longer ladders — failed not because it was insufficiently resourced but because it was working from the wrong principle. Flight required abandoning the speed-on-ground framework entirely and identifying a different principle: lift. The relationship between lift and forward velocity is not discoverable by optimising the horse further. It required a different question.

The current AGI race is optimising the horse. The computational resources being deployed are extraordinary and the performance gains are real. But every benchmark saturation result, every ARC-AGI-3 collapse, every FrontierMath failure rate is evidence that the paradigm-break is not in the direction currently being pursued. The architectural principle that would reach AGI in its full sense — the equivalent of lift — has not been identified. Possibly it is one of the five approaches described in Section V. Possibly it is something not yet conceived. What is increasingly clear is that it is not more FLOPS on the transformer.

The Timeline Reframe

The probability that consensus-convergent scaling reaches full AGI by 2027: low, for architectural reasons, not just resource reasons. The probability that scaled consensus-convergent systems reach a performance level commonly called "AGI" in commercial usage: moderate, because that commercial definition has drifted toward "very capable on consensus tasks." The probability that true AGI — matching or exceeding the full range of human cognitive capability including paradigm-generating cognition — is reachable by any architecture currently deployed: our analysis says structurally no, not as a date question but as a path question.

VII. The Investable Signal

The thesis generates several second-order signals relevant to capital allocation. PRZC Research states directly: this is not a clean equity theme. The economic analysis is structurally sound; the investable instruments are limited and largely private. Readers should treat this section as identifying indicators rather than recommending positions.

The Current Capex Cycle Builds to a Local Maximum

The trillion-dollar AI infrastructure buildout — GPU clusters, data centres, power infrastructure — is priced as though it is building toward AGI. Our analysis suggests it is building toward the neurotypical performance ceiling, which current frontier models are already approaching. This is still commercially valuable: there is enormous revenue to be extracted from consensus cognition at commodity cost. But it is not the endpoint that AGI valuations are pricing. When the market recognises that the ceiling of the current paradigm is the ceiling of neurotypical performance — and not true general intelligence — the premium currently assigned to "AGI infrastructure" becomes a premium assigned to "very good consensus cognition infrastructure," which is a different, smaller number.

Leading Indicators of Architecture Pivot

The watch list for a genuine architecture pivot — the moment when a lab identifies a path beyond the neurotypical ceiling — includes:

Publications on curiosity-driven learning and active inference as primary training paradigms (not as supplements to RLHF). Friston's VERSES AI research and Schmidhuber's IDSIA work are the current frontier; commercial deployment is the signal.
Sparse or stochastic attention mechanisms deployed for cognitive diversity rather than computational efficiency. The architectural language in research papers will distinguish these purposes if you look for it.
Adversarial MoE training where non-consensus experts are explicitly included and evaluated against novelty metrics rather than task performance metrics.
Novel evaluation frameworks that assess output quality without reference to existing benchmarks — the bootstrapping problem solution. The lab that publishes a credible approach to evaluating non-consensus outputs is announcing that it has identified the path beyond the ceiling.
ND-calibrated annotation pipelines described in methodology sections of alignment research. This is an intermediate indicator — less decisive than architectural innovation but meaningful as a signal of awareness of the problem.

The ND Cognitive Premium Intensifies

The companion report, The Last Inimitable Minds, argued that neurodivergent cognitive value increases as AI capability advances, because AI commoditises consensus cognition while being structurally unable to reach ND cognition. This report strengthens that argument: if AGI cannot replicate ND cognition by architectural design — not just by current capability but by structural constraint — then ND human minds are not merely a transitional premium during the AI capability ramp-up. They are the permanent scarce factor. The scarcity does not resolve when AI capability plateaus at the neurotypical ceiling. It intensifies.

The investable implication, while diffuse, points in a consistent direction: the organisations that develop systematic capability to identify, retain, and deploy ND cognitive profiles — particularly in research-intensive sectors where paradigm breaks are the product — acquire a structural advantage that no AI capability advance can erode. This is an argument about talent strategy as durable competitive moat, not a sector rotation.

VIII. Honest Limits and Falsifiability

A hypothesis that cannot be falsified is not a hypothesis. PRZC Research states the conditions under which this analysis would be wrong and addresses the strongest counterarguments directly.

What would falsify this hypothesis

The central claim is that consensus-convergent training cannot produce paradigm-generating cognition. The falsifying evidence would be: a frontier model, trained by current methods (RLHF, Constitutional AI, supervised fine-tuning on existing data) producing a genuine paradigm break — defined as a conceptual framework that (a) was absent from the training data, (b) is validated by subsequent work as a genuine advance, and (c) cannot be characterised as recombination of existing frameworks. This is a high bar, intentionally. The output must be framework-level innovation, not sophisticated interpolation. If a consensus-convergent model produces this, the hypothesis is falsified.

IQ as an imperfect proxy

IQ measurement captures a real but narrow slice of cognitive variance. It correlates strongly with academic performance and certain professional outcomes; its relationship to paradigm-generating cognition is less clear. The ND/genius correlation is structurally consistent with the argument but not mechanistically established. This analysis uses the historical record and the Forbes data as structural evidence, not as a rigorous causal claim. The argument does not depend on IQ as a precise metric; it depends on the empirical observation that the historical instances of paradigm-breaking cognition exhibit cognitive signatures categorically different from expert neurotypical performance, regardless of how that difference is measured.

The strongest counterargument: compute-induced phase transitions

The most serious challenge to this hypothesis is the possibility that sufficient compute produces phase transitions in model capability that qualitatively resemble divergent cognition even within consensus-convergent architecture. This is not implausible. Large language models have already exhibited emergent capabilities at scale that were not predictable from small-scale behaviour. If the scaling curve has an inflection point at which quantitatively better consensus cognition produces qualitatively different processing — something that looks like genuine framework-level innovation from the outside — then the architectural constraint argument weakens.

PRZC Research takes this counterargument seriously. Three responses. First, the ARC-AGI-3 and FrontierMath results indicate that current scaling, including inference-time compute scaling, is not producing this phase transition on the tasks designed to detect it. Second, the argument for compute-induced phase transitions relies on an analogy to prior emergent capabilities, all of which emerged on consensus-measurable tasks — not on the categories of cognition identified here. Third, even if such a phase transition existed, the self-renewing paradox applies: if the phase transition produces outputs that look like ND cognition, the training distribution would shift toward those outputs, making them the new consensus and redefining what "divergent" means. The gap would reopen at a higher level.

The burden of proof

This is a hypothesis, not a proof. The structural logic is consistent; the empirical evidence supports it; the alternative architectures it predicts are necessary are precisely the ones not in commercial deployment. But absence of evidence for the counterargument is not the same as evidence for this argument. PRZC Research's position is that the structural logic of consensus-convergent training is sufficiently compelling to warrant the burden of proof shifting to those claiming it can reach AGI in the full sense — not as a matter of dogma but as a matter of where the architecturally-grounded arguments currently point.

Conclusion: The Ceiling Is Not in the Compute Budget

The AGI debate is dominated by questions of compute, data, and time. How many GPUs? How much training data remains? How many years until capability crosses the threshold? Our analysis argues that these questions, while not unimportant, are the wrong axis. The ceiling is not in the compute budget. It is in the architecture.

Consensus-convergent training — RLHF, Constitutional AI, next-token prediction on a normalised corpus — converges toward a precise target: the cognitive patterns of the neurotypical majority, as filtered through the publication pipeline, annotation preferences, and normative consensus documents that constitute the training signal. That target is well-defined and the current training paradigm is reaching it. Frontier models match or exceed skilled neurotypical human performance on every consensus-measurable task. Benchmark saturation is the empirical confirmation of proximity to the ceiling, not evidence of progress toward AGI.

Beyond that ceiling lies a different kind of cognitive performance: the paradigm-generating cognition of the minds that have historically produced the largest advances in human knowledge and capability. Those minds do not occupy a higher point on the neurotypical performance axis. They occupy a different axis entirely — one defined by structural departure from consensus rather than optimisation within it. The architecture currently being scaled optimises with precision and power along the neurotypical axis. It cannot reach the paradigm-generating axis by further optimisation. It can only reach it by a structural change in what it is optimising for.

The first organisation to identify that structural change — to solve the curiosity problem, the stochastic association problem, the spatial simultaneous processing problem, or the bootstrapping evaluation problem — accesses the one problem space that the entire rest of the industry, by the structure of its current methods, cannot reach. That is not a small commercial opportunity. It is the largest unsolved problem in the field.

"The ceiling of AGI under consensus-convergent training is not a computational limit. It is a definitional one. The model is trained toward a target it will eventually match. The target is neurotypical cognitive performance. The thing called AGI — matching the full range of human cognitive capability — is a different target. Training harder toward the first target does not approach the second. It confirms the boundary between them."
— PRZC Research, March 2026

Disclaimer: This report is produced by PRZC Research for informational and analytical purposes only. It does not constitute investment advice, a solicitation, or a recommendation to buy or sell any security. The analytical framework presented is speculative and represents the authors' interpretation of structural trends in AI development, cognitive science, and architecture. Claims about AI performance are based on publicly available benchmark data as of the report date. Benchmark scores evolve rapidly; readers should verify current figures independently. Readers should conduct their own due diligence before making any investment decision.