Article · The AI Layer

The jagged frontier: where AI compounds value, and where it quietly destroys it.

Capability mapping, hallucination, deskilling, and the AI moves that determine whether an organization captures value or generates theater.

~ 14 minutes · 20 sources · Edition 1, May 2026

In September 2023, Harvard Business School researchers ran a field experiment with 758 consultants at Boston Consulting Group. Half were given access to GPT-4. Half were not. The consultants then completed eighteen real consulting tasks designed to span what the researchers called the "jagged technological frontier" — some tasks well within the model's competence, some just beyond it.

On tasks inside the frontier, the AI-assisted consultants outperformed their peers by 12% to 43% — a margin large enough to be visible in real client work. On tasks outside the frontier, the same consultants were nineteen percentage points more likely to be wrong than their unassisted colleagues. The phrase the authors used in the paper is the one worth keeping: they followed the AI off the cliff.^[2023]

Figure 5 · The jagged frontier. Consultants gained 12-43% inside the frontier and were 19 percentage points more likely to be wrong just outside it.

The argument of this article is that the AI layer of leadership is essentially the work of telling, in real organizational conditions, what is inside the frontier and what is just beyond it. The evidence is that this work is not glamorous, not strategic in the conventional sense, and almost entirely about avoiding two predictable failure modes: pointing AI at work where it does not compound, and letting AI do work where the human capability must remain load-bearing. Mismanaging either side is what separates organizations that capture AI value from organizations that generate AI theater.

I. The Frontier Is Jagged, Not Smooth

The seductive picture of AI capability is a smooth curve: models get better every quarter; tasks become possible in sequence; the organization simply has to time its bets. The empirical picture looks different. Capability is jagged. Two tasks that look adjacent to a human eye can sit on opposite sides of the model's competence. Knowing where the line is for the specific work in front of you is not a strategic capability that can be delegated to a Chief AI Officer. It is a working capability that has to live in the leader closest to the work.

Brynjolfsson, Li and Raymond's 2025 paper in the Quarterly Journal of Economics made this concrete in a way the rest of the literature has been catching up to. Studying a call centre that deployed generative-AI assistance, they found a 14% average productivity gain — but the average concealed the structure. Novice workers gained 34%. Experienced workers gained almost nothing.^[2025] The headline number was real and the value was real, but the value was concentrated, not uniform. A leader who deployed the same tool uniformly across the function would miss most of the actual win and might destroy value at the senior end of the team — where the tool was, in effect, an unwanted second opinion.

The AI-layer move here is mapping. Map the function's work into three zones — AI-leverage, AI-neutral, AI-hostile — and allocate adoption pressure accordingly. The mapping is not permanent. It changes every time the underlying model changes, which is every few months. But the discipline of having a current map is the difference between a function that compounds AI value and one that distributes it across work where it adds nothing.

II. The Reliability Trap

One of the most counter-intuitive findings in the human factors literature, replicated for forty years, is that more-reliable automation produces worse human oversight. Parasuraman and Manzey's 2010 meta-analysis in Human Factorsconsolidated decades of work into a clear pattern: high-reliability automation reliably causes humans to under-monitor and to accept incorrect outputs more readily than low-reliability automation.^[2010] The phenomenon has a name — automation complacency — and a mechanism. When a tool is right most of the time, the cost of checking starts to feel larger than the expected benefit, and the checking quietly stops.

Endsley's 2017 synthesis of 30 years of human-automation research in aviation, military operations and driving extends the picture: higher automation reliably degrades operator situation awareness and lengthens reaction time when manual takeover is needed.^[2017]The pattern is older than AI and independent of it. Bainbridge named it in 1983 as the "Ironies of Automation": the more automation handles routine work, the less the human practises the skills required when automation fails — and the higher the stakes when they must.^[1983]

The AI-specific replications have begun arriving. Dell'Acqua's 2022 field experiment with HR recruiters showed that pairing recruiters with a high-quality AI reduced their effort and attention compared to pairing them with a lower-quality one; the higher reliability triggered the complacency the prior literature predicted.^[2022] Agarwal and colleagues, working in radiology, showed that radiologists given AI predictions systematically under-weighted their own private information; the combined human+AI performance was often worse than AI alone or human alone.^[2023] The anecdotal version is the canonical one: the National Transportation Safety Board identified flight-crew over-reliance on automated systems and the degradation of manual flying skills as contributing causes of the Asiana Airlines 214 crash at San Francisco in 2013.^[2014]

The leadership consequence is that the AI layer cannot be managed by tracking AI reliability alone. It has to be managed by tracking residual human reliability at the same time. A function with an excellent AI and an atrophied workforce is one anomalous week away from catastrophe.

III. Deskilling Is Measurable Now

For most of the period covered by the human-factors literature, deskilling was something one had to argue about. The studies were careful, the case histories were specific, but the population-level measurements were scarce. That has changed.

In June 2025, The Lancet Gastroenterology & Hepatology published a multi-centre observational study of endoscopists who had been performing AI-assisted colonoscopies. When the same endoscopists performed standard, unaided colonoscopies, their adenoma detection rate dropped from 28.4% to 22.4% — a roughly 20% relative decline in cancer-relevant performance after regular exposure to AI assistance.^[2025] The measurement is the cleanest deskilling result in clinical practice to date. It is not a hypothesis about what might happen. It is a measured fact about what is happening.

Figure 3 · Measured deskilling. A ~20% relative drop in unaided performance after regular AI use.

The cognitive-science literature has begun producing analogous results in knowledge work. Kosmyna and colleagues at the MIT Media Lab ran an EEG study of essay writers using ChatGPT versus working unaided; the AI-assisted writers exhibited the weakest neural connectivity and the lowest recall of what they had just written, with brain-to-brain coupling reduced by approximately 55% versus the brain-only group.^[2025] Lee and colleagues at Microsoft Research surveyed 319 knowledge workers and found higher confidence in generative AI correlated with measurably less critical-thinking effort; workers were more likely to defer judgment when they trusted the tool.^[2025] Nicholas Carr's book-length warning a decade earlier — that automation erodes tacit expertise, situational awareness, and craft across domains — now reads less as a thesis to argue and more as a forecast that has been arriving on schedule.^[2014]

The AI-layer move at the deskilling threshold — Level 4 in the PolyCognitive framework — is to catalogue the failure modes of the AI in actual use, identify which human skills are load-bearing for each failure mode, and stop staffing decisions that assume AI reliability above the evidence. Said simply: know where the AI is fragile, and protect the human capability that has to take over when it is.

IV. The Failure Modes That Are Not Hallucinations

The public discussion of AI risk has fixated on hallucination — the model generating a confident statement that is not true. It is a real failure mode. But the institutional case histories of AI-system failure suggest that most large-scale damage has not come from hallucination. It has come from systematic bias baked into training data, from validation gaps that survive deployment, and from confident operation in conditions the model was never tested against.

Obermeyer and colleagues, writing in Science, dissected a healthcare risk-management algorithm covering roughly 200 million U.S. patients. The algorithm systematically under-identified Black patients for high-risk care management: at any given risk score, Black patients were considerably sicker than white patients. The bias was not in the model architecture. It was in the choice of healthcare-spending as a proxy for healthcare-need.^[2019]Amazon's multi-year internal AI recruiting project, reported by Reuters in 2018, was scrapped after the system was found to penalise resumes containing the word "women's" — a learned correlation from a decade of male-dominated hiring data.^[2018] Wong and colleagues, writing in JAMA Internal Medicine, externally validated Epic's widely deployed sepsis prediction model: the model missed 67% of sepsis cases and generated alerts on 18% of all hospitalised patients — a precision-recall failure that produced massive alert fatigue without the predictive value to justify it.^[2021]

The commercial case histories are equally instructive. Zillow shut down its AI-driven home-flipping business after losses exceeding $880M and a 25% workforce cut, citing the inability of its pricing algorithm to forecast home prices accurately at scale.^[2022]IBM's $5B+ investment in Watson Health collapsed after the system gave unsafe cancer-treatment recommendations at MD Anderson and other partners; IBM ultimately sold the assets for a fraction of cost.^[2019]

The common pattern across these failures is not technical naïveté. It is the absence of a leadership layer asking, repeatedly, where the model has been validated, against what distribution, and under what conditions the validation no longer holds. The AI-layer move is to treat that question as a permanent operating discipline, not a launch-readiness checklist.

V. Why Bolting AI On Doesn't Compound

Suppose the frontier has been mapped, the complacency risk has been managed, and the failure modes have been catalogued. The organisation is now at Level 3 in the PolyCognitive maturity model — Safe Value, Bad Workflow. The wins are real and the risks are contained, but the operating model has not been rebuilt. Local tasks are faster; the chain those tasks sit in has not changed. The compound returns of AI are left on the table.

The strongest theoretical statement of this problem belongs to Iansiti and Lakhani, whose 2020 book Competing in the Age of AIargued that AI-centric firms operate at fundamentally different scope, scale, and learning curves than traditional firms, requiring an "AI factory" operating architecture. Bolting AI onto a legacy operating model preserves the bottleneck.^[2020]Davenport and Mittal's case-study work on AI-leading firms reaches the same conclusion empirically: value comes from companies that put AI at the heart of strategy and redesign operating models — not those running isolated pilots.^[2023]McKinsey's Rewired to Outcompete work finds digital and AI leaders outperform peers by 2-6× on total shareholder return, but the differentiator is the rewiring of six organisational capabilities, not the technology budget; more than 70% of digital transformations continue to fall short of objectives.^[2024]

The instruction at Level 3 is the same instruction Michael Hammer wrote in HBRin 1990 about IT: "Don't automate, obliterate." The technology has changed; the warning has not. The AI-layer move is to redesign the system, not enhance the task. Identify handoffs that exist only because of pre-AI scarcity. Compress review cycles where AI changes the failure economics. Treat the workflow as the unit of change, not the task that sits inside it.

VI. The Five AI Moves

The PolyCognitive framework names five capabilities, each on a human and an AI track. The AI track of each capability is the translation of the research above into a leadership move at a specific stuck-point.

A.1 · Unsticks Level 0
Understand what is actually worth pushing the team toward. Develop a working taxonomy of high-leverage use cases. Maintain a personal practice with the tools, not delegated to a champion. Driving AI Adoption →
A.2 · Unsticks Level 1
Know where AI creates real value vs. where it is theater. Map the function's work into AI-leverage, AI-neutral, and AI-hostile zones. Resist using AI where the human signal is the product. Extracting Value from AI →
A.3 · Unsticks Level 2
Know where the real risks live. Know which data flows leave the perimeter through which models. Track hallucination rates in domain-specific work, not on benchmarks. Distinguish reversible from irreversible AI errors. Managing AI Risk →
A.4 · Unsticks Level 3
Architect new ways of working instead of bolting AI onto broken processes. Map the end-to-end process; redesign rather than enhance. Treat the workflow as the unit of change, not the task. Redesigning Workflows →
A.5 · Unsticks Level 4
Know where AI is fragile and humans must stay sharp. Catalogue the failure modes of the AI in actual use. Identify which human skills are load-bearing for each failure mode. Track skill levels as carefully as you track AI capability. Preserving Human Edge →

The AI layer is essentially the work of telling, in real organisational conditions, what is inside the frontier and what is just beyond it.

The conclusion is unfashionable. The AI layer of leadership is not the visionary layer. It is the operational layer that knows where the model is reliable today, where it is fragile, what fails silently, what fails loudly, and what shape the failure takes when the work is real. Most of the discourse on AI strategy floats at an altitude where these distinctions are invisible. Most of the value capture happens at the altitude where they are non-negotiable. The PolyCognitive Leader is the profile that holds both layers — human and AI — at the altitude where the work actually happens.

Companion article

The Human Layer: Why AI adoption is an identity problem before it is a technology problem.

Resistance, accountability, workflow grief, and the human moves that determine whether an organization moves past Level 0.

Read article →

Sources cited in this article

2023
Dell'Acqua, F., et al. (2023). Navigating the jagged technological frontier (HBS WP 24-013).
Source ↗
2025
Brynjolfsson, E., Li, D., & Raymond, L. R. (2025). Generative AI at work. Quarterly Journal of Economics, 140(2), 889-942.
Source ↗
2022
Dell'Acqua, F. (2022). Falling asleep at the wheel. Harvard Business School Working Paper.
Source ↗
2023
Agarwal, N., et al. (2023). Combining human expertise with artificial intelligence (NBER WP 31422).
Source ↗
1983
Bainbridge, L. (1983). Ironies of automation. Automatica, 19(6), 775-779.
Source ↗
2010
Parasuraman, R., & Manzey, D. H. (2010). Complacency and bias in human use of automation: An attentional integration. Human Factors, 52(3), 381-410.
Source ↗
2017
Endsley, M. R. (2017). From here to autonomy: Lessons learned from human-automation research. Human Factors, 59(1), 5-27.
Source ↗
2014
Carr, N. (2014). The glass cage: Automation and us. W. W. Norton & Company.
Source ↗
2025
Budzyń, K., et al. (2025). Endoscopist deskilling risk after exposure to AI in colonoscopy. The Lancet Gastroenterology & Hepatology.
Source ↗
2025
Kosmyna, N., et al. (2025). Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task. arXiv:2506.08872.
Source ↗
2025
Lee, H.-P., et al. (2025). The impact of generative AI on critical thinking. CHI '25.
Source ↗
2014
National Transportation Safety Board. (2014). Asiana Airlines flight 214 accident report (NTSB/AAR-14/01).
Source ↗
2019
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.
Source ↗
2018
Dastin, J. (2018, October 10). Amazon scraps secret AI recruiting tool that showed bias against women. Reuters.
Source ↗
2021
Wong, A., Otles, E., Donnelly, J. P., et al. (2021). External validation of a widely implemented proprietary sepsis prediction model. JAMA Internal Medicine, 181(8), 1065-1070.
Source ↗
2022
Parker, W., & Putzier, K. (2022, February 10). Zillow's shuttered home-flipping business lost $881 million in 2021. The Wall Street Journal.
Source ↗
2019
Strickland, E. (2019, April 2). How IBM Watson overpromised and underdelivered on AI health care. IEEE Spectrum.
Source ↗
2020
Iansiti, M., & Lakhani, K. R. (2020). Competing in the age of AI. Harvard Business Review Press.
Source ↗
2023
Davenport, T. H., & Mittal, N. (2023). All-in on AI. Harvard Business Review Press.
Source ↗
2024
Yee, L., Chui, M., Roberts, R., & Xu, S. (2024). Rewired to outcompete. McKinsey Quarterly.
Source ↗

Full bibliography →