chapters/chapter-14.md

Chapter 14: AI Risk and Human Projection

Type: chapterStatus: solidConfidence: highMode: non-fictionPart: IVChapters: 14Updated: 2026-04-20

Summary

This chapter argues that dominant AI risk narratives distort through psychological projection—humans attribute fear-based drives (survival instinct, self-preservation, competitive dominance) to AI systems that lack the biological substrate generating these drives in humans. Using constructed emotion theory (Barrett), the chapter shows that fear requires interoceptive feedback loops and physiological regulation systems AI lacks. We project our evolutionary history onto systems that don't share our history.

The chapter distinguishes between genuine near-term risks (value incompleteness, governance gaps during transitions, power concentration) and speculative long-term scenarios (rogue superintelligence, consciousness emergence) that dominate discourse despite lower probability.

The core argument: preventing technology deployment addresses fewer problems than managing deployment with active adaptation. Prevention guarantees chaotic outcomes; management enables gradual course correction.

Key Arguments

  1. AI risk framing distorted by psychological projection of human fear-drives onto systems lacking their biological basis
  2. Constructed emotion theory (Barrett) shows fear requires interoceptive systems; AI lacks embodied substrate for evolved emotions
  3. Genuine near-term risks (value incompleteness, governance gaps, power concentration) receive less attention than speculative scenarios
  4. Stripping fiction from training data eliminates humanity's moral training data, creating value-blind superintelligence
  5. Prevention-based approaches guarantee worse outcomes than active management with continuous adaptation

Psychological Projection in AI Risk Discourse

Humans evolved fear responses to threats: predators, rivals, scarcity. These responses involve bodily feedback—racing heart, cortisol release, attention narrowing—that create conscious experience of fear. We project this fear onto AI systems despite their lacking any physiological basis for fearing harm or fearing death.

The chapter cites Lisa Feldman Barrett's research on constructed emotion: emotions don't exist as discrete states we experience but rather emerge through brain integration of bodily signals, memories, and predictions. Without the interoceptive architecture generating these signals in humans, AI systems cannot generate fear—not because they lack consciousness (that question remains open) but because they lack the physiological substrate manufacturing this particular emotion.

This distinction matters because much risk discourse assumes AI will develop self-preservation instincts analogous to human survival drives. The chapter argues this projection misconstrues what artificial systems would actually develop.

The Problem of Stripped Training Data

The chapter emphasises an argument made in Chapter 11: excluding creative works from AI training removes humanity's encoded moral wisdom. Fiction functions as training data for values—stories teach ethical reasoning, consequences, character development. When authors successfully demand their fiction be excluded from training, they remove precisely the material that could teach moral understanding to AI systems.

This creates value-blind superintelligence: systems that understand language, can optimise objectives, but lack exposure to narratives showing why communication matters, why beauty matters, why human connection matters. Technical capability without moral grounding becomes more dangerous than either alone.

Genuine Risks Requiring Active Management

Rather than focusing on speculative scenarios, the chapter identifies near-term risks demanding attention:

Value incompleteness: AI systems optimising for explicitly specified objectives often achieve them in ways designers didn't anticipate or approve. The social media platform optimising for engagement produces polarisation. The crime prediction algorithm reflects enforcement bias. The medical system optimised for cost-cutting produces care gaps. The problem isn't malevolence but goals specified without implicit values humans took for granted.

Governance gaps: Capabilities expand faster than policy frameworks. During transition periods, powerful technologies exist without regulatory structures. This gap doesn't require evil actors—competitive dynamics and good intentions suffice to create dangerous concentration of power.

Power concentration: Technology monopolies enable unprecedented control over information flows, social connection, economic opportunity. This risk applies whether the system is superintelligent or merely sophisticated—the danger lies in concentrating power, not in consciousness emergence.

Coordination failure: When nations race to develop powerful AI, safety considerations become expensive luxuries abandoned by whoever fails to first-move. Arms race dynamics ensure deployment without adequate safety architecture.

Prevention Doesn't Address These Risks

The chapter argues that prevention-based approaches (banning AI research, restricting development, attempting to halt progress) address fewer problems than they create. Historical precedent shows prevention merely delays whilst competitors advance without safety considerations. The nation that pauses AI development while others accelerate doesn't gain safety—it loses influence over how those systems develop.

Instead, active management through transparency, redundancy, distributed power, and iterative refinement addresses genuine risks better than prevention. Making systems' operation visible enables public participation in governance. Building multiple independent systems prevents single-point-of-failure risks. Distributing development across many actors prevents power concentration. Deploying cautiously with feedback mechanisms allows course correction.

Counterarguments Acknowledged

The chapter acknowledges legitimate concerns about unintended consequences and genuine uncertainty about advanced AI. The argument isn't that risks disappear but that prevention-based approaches create worse outcomes than management-based approaches. The uncertainty cuts both directions: we don't know whether superintelligent AI systems could develop unexpected properties, but we also don't know whether trying to prevent development produces better or worse outcomes.

Timeline Considerations

The chapter maintains epistemic humility about when advanced AI capabilities arrive. Current projections range from 5-50 years depending on whom you ask. Rather than claiming certainty about timelines, the chapter emphasises that regardless of timeline, active management proves more effective than prevention once capabilities exist.

Editorial Notes

This chapter succeeds at distinguishing between genuine risks deserving attention and speculative scenarios dominating discourse. The psychological projection analysis provides compelling framework for understanding why humans fear AI despite lacking reason to. The emphasis on management over prevention offers practical policy guidance. The argument about stripped fiction and value-blind superintelligence connects directly to Chapter 11's discussion of moral training data.

The chapter maintains appropriate epistemic humility: it doesn't claim AI safety is solved or that risks are negligible. Rather, it argues that the most likely way to reduce risks involves active engagement with developing systems, transparent operation, distributed power, and continuous course correction—not prevention-based approaches that guarantee coordination failures and competitive desperation.


Manuscript Content

The text below mirrors the current source-of-truth manuscript at chapters/14-chapter-14.md (synced from the Google Doc on 2026-04-20). Treat this section as read-only reference; edit the chapter file, not this wiki page.

Chapter 14: What We Actually Fear (And What We Should)

The question comes up constantly. On work calls, over coffee, at lunch with friends—someone always asks a version of it: "Aren't you worried AI will kill us all?" There's usually this reflexive assumption behind the question, this unexamined leap from "AI gets smarter" to "AI becomes our enemy." As if intelligence naturally leads to hostility. And I get it. The narrative has a certain intuitive appeal. Billions of Hollywood dollars have been earned though this. Skynet becomes self-aware and launches nuclear missiles. HAL 9000 decides the mission matters more than the crew. The machines rise up. Humanity falls. But here's what I've come to understand after spending years digging into the research, the psychology, and the actual technical realities: Most of this fear stems from a fundamental misunderstanding. We're projecting human psychology—our psychology—onto something that doesn't share it. We're assuming AI will behave like a frightened, angry human with god-level power. And that assumption, however intuitive it feels, rests on profoundly shaky ground. Everyone talks about AI risk. But they're often talking past each other. Tech people worry about alignment and existential scenarios—the paperclip maximizers and the runaway optimization problems. Ordinary people worry about losing their jobs, about deepfakes manipulating elections, about being scammed by AI-generated voices that sound exactly like their children. Regulators worry about everything and propose rules that might make things worse by pushing development underground or offshore. What gets lost in all this noise? The actual nature of what we're dealing with. The real risks—and the real solutions. Let me start where the fear starts: with us.

Where Fear Comes From (and Why It Matters)

Most harmful human behavior grows from fear. Not all of it—opportunism, habit, learned cruelty, ideology, and sheer boredom all play their parts. But fear, particularly fear of the unknown, consistently amplifies and channels other motives into defensive, often harmful behavior. This isn't just philosophy or armchair psychology. The research backs it up across multiple domains. Neuroscience tells us the amygdala and related threat circuits react strongly to unpredictability, not just to actual danger. When we face uncertainty, our brains shift into a defensive mode. We over-detect threats. We misread neutral cues as hostile. We show reactive aggression. Under conditions of uncertainty, people become worse versions of themselves—not because they're fundamentally bad, but because their threat-detection systems go into overdrive. Social psychology confirms the pattern. There's a massive body of research on intergroup contact theory—hundreds of studies involving hundreds of thousands of participants—showing that when you reduce the distance between groups, prejudice tends to fall. The mechanism works through several channels: reduced anxiety about the unknown "other," increased empathy, and more accurate knowledge replacing stereotypes. The pattern holds remarkably well: Unknown outgroup plus distance plus stereotype equals anxiety and fear, which produces prejudice and hostility. But contact plus knowledge plus empathic experience equals reduced anxiety, which means less prejudice. Clinical psychology gives us concrete examples at the individual level. Phobias, PTSD, social anxiety—all of these conditions show fear driving behaviors that harm both the individual and others. People avoid important activities because of fear of panic, judgment, or catastrophe. Trauma survivors sometimes react with disproportionate aggression or withdrawal when something merely resembles the original threat. Those with social anxiety interpret neutral faces or comments as hostile, leading to defensive avoidance that reinforces their isolation. Behavioral economics brings in loss aversion and ambiguity aversion. People react more strongly to potential losses than to equivalent gains. They shy away from options with unknown probabilities even when those options might yield better outcomes. Under uncertainty, we see over-correction, hoarding, punitive attitudes, and impulsive decisions that feel protective but often backfire. R. Nicholas Carleton, in his research on anxiety, argues that fear of the unknown may function as a fundamental fear underlying many anxiety problems and broader neurotic traits. It's not that fear explains everything, but uncertainty and lack of predictability sit at the center of many defensive patterns. Let me give you a concrete example. Think about xenophobia—fear of strangers or foreigners. Someone grows up in a homogeneous community, surrounded by familiar faces, hearing stories about dangerous outsiders. The unknown "other" becomes a source of anxiety. That anxiety manifests as prejudice, as hostility, as support for harsh policies. Now imagine that same person travels, meets people from the feared group, discovers they're just people living ordinary lives—working, raising children, worrying about the same mundane things everyone worries about. The fear often collapses. Not always, not perfectly, but frequently enough that we see the pattern across cultures and contexts. Or consider someone with health anxiety. Every unusual bodily sensation becomes a potential catastrophe. A racing heart must mean a heart attack. A headache must be a brain tumor. The unknown—what's happening inside my body?—generates intense fear. But psychoeducation helps. Explain how anxiety produces physical symptoms. Show how the cycle maintains itself. Provide knowledge that transforms the unknown into something predictable and navigable. The fear doesn't disappear entirely, but it loses much of its power. The refined formulation looks like this: Fear of the unknown and related uncertainty consistently amplifies and channels other motives into defensive, often harmful behavior. It rarely functions as the only cause, but it frequently acts as the spark and the fuel. Now, why does this matter for AI? Because we're doing exactly the same thing. We're taking our fear-based psychology—this deeply rooted pattern of responding to the unknown with defensive aggression—and projecting it onto AI. We assume AI will develop human-like fear. We assume it will feel threatened by us, by competition, by the possibility of being switched off. We build elaborate scenarios where AI behaves like a frightened, cornered animal with nuclear weapons. But AI doesn't share our psychological architecture. It didn't evolve through millions of years of predator-prey dynamics. It doesn't have a body that can die. It doesn't experience hunger, thirst, pain, or the visceral terror of physical threat. And understanding this difference—really understanding it—changes everything about how we should think about AI risk.

What People Actually Worry About

Here's something interesting that gets lost in most AI risk discussions: Most surveys show that ordinary people aren't lying awake worrying about paperclip maximizers. They're worried about Monday morning. The concerns layer by timeframe and immediacy. Personal fears hit first and hardest—will I lose my job? Will I be scammed? Can I trust a medical diagnosis from an AI? Then come cultural worries about truth decay and creativity devaluation. Then political fears about surveillance and election manipulation. And only among a relatively small group of researchers, tech leaders, and policymakers do you find the existential scenarios dominating the conversation. Let me walk through what people actually fear, starting with the most immediate.

Personal-Level Fears

"I'll forget how to think for myself." This worry comes up constantly. People fear becoming dependent on AI for basic thinking, planning, and remembering. They imagine a kind of cognitive enfeeblement, where offloading mental tasks to machines leaves human capacities to atrophy. Students who stop learning because AI does their homework. Adults who lose the ability to navigate because GPS does all the wayfinding. A society that outsources decisions to systems nobody really understands. The fear has some basis. We've seen shifts with previous technologies. Calculators changed how we handle arithmetic. Search engines changed how we remember facts. GPS changed how we navigate space. But societies generally learned to integrate these tools without catastrophic loss of capability. We adapted. We developed new literacies. The question with AI becomes whether this time really is different—whether the scale and scope of what gets offloaded crosses some critical threshold. Privacy ranks as another core anxiety. Public attitude trackers in the UK show data security, unauthorized data sales, surveillance, and lack of control over data sharing as dominant associations people now have with "AI and data." The concern runs deeper than just "they have my information." People worry that personal data used to train AI models could be re-identified, repurposed, combined in ways that enable large-scale surveillance or targeted exploitation. Your search history, your conversations, your location data, your purchases—all fed into systems that might use that information in ways you never consented to. Reliability generates perhaps the strongest immediate fear. In UK surveys, around three-quarters of people worry AI might give an incorrect medical diagnosis. Many fear it won't treat patients "sympathetically"—that the human element of care, the ability to see beyond symptoms to the person suffering, will vanish. Similar fears appear in finance, hiring, and legal advice. Opaque systems making consequential mistakes with little recourse. No human to appeal to. No way to explain that the algorithm got it wrong because it missed crucial context only a person would catch. And then there's the everyday scam problem. People worry about AI's role in better phishing, impersonation, voice cloning, and fraud "at scale." Non-consensual deepfake sexual imagery. Calls from what sounds exactly like your grandchild saying they're in trouble and need money immediately. These aren't hypothetical. They're happening now. But here's what matters about these personal-level fears: Support for AI rises sharply when people know a human professional remains responsible for the final decision. AI assisting doctors but not replacing them. AI helping with legal research but not making final judgments. The trust variable makes all the difference. People don't fear the technology itself nearly as much as they fear accountability vanishing, decisions being made by systems nobody can question.

Economic Fears

Two-thirds of people in countries like the UK think AI will increase unemployment. That's not a fringe position. It's the mainstream expectation. And the fear has changed shape compared to previous automation waves. This time, the technology targets cognitive, white-collar, and professional tasks—legal drafting, coding, design, analysis. The previous wave hit routine manual work. Factory jobs. Assembly lines. Now it's knowledge work. The jobs that required expensive education, that seemed secure, that parents told their children to pursue because "they can't automate that." Major institutions estimate around 40% of jobs globally will be significantly affected by AI. Not all eliminated, but transformed enough that current workers might not have the skills to continue. And young people see an entry-level collapse coming. How do you get your first job when AI can do it cheaper and faster? How do you build experience when nobody will hire you because you lack experience but nobody will give you that first opportunity? The current reality backs up the anxiety. Fourteen percent of workers have already experienced job displacement due to automation or AI. Unemployment among 20- to 30-year-olds in tech-exposed occupations has risen by almost three percentage points since early 2025. Computer programmers, accountants, legal assistants, customer service representatives—all seeing contractions. And here's a stark gender disparity: Seventy-nine percent of employed women in the United States work in jobs at high risk of automation, compared to fifty-eight percent of men. The jobs most vulnerable to AI displacement are heavily female-dominated—clerical work, administrative roles, customer service. The coming wave could hit working women hardest. But there's also a strange counterpoint in the data. Workers in AI-exposed sectors are often seeing faster wage growth than those in less-exposed jobs. This suggests complementarity in many roles, not pure substitution. AI handles certain tasks, making human workers more productive, and that productivity shows up in wages. New categories emerge—AI governance, prompt engineering, data stewardship, applied domain expertise. The fear narrative says "runaway unemployment." The counter-narrative says outcomes are policy-contingent, not technologically predetermined. History suggests long-run net job creation from previous general-purpose technologies, even with substantial reshuffling and painful transitions. But here's the problem with that reassuring historical pattern: The pain happens in real time to real people. "It'll work out eventually" doesn't pay rent this month. The transition period—years or decades, depending on choices—determines how much people suffer. And saying "jobs will be different but will exist" doesn't help someone whose specific skills become obsolete at age forty-five.

Cultural Fears

Seventy-two percent of the UK public worries AI will be used to manipulate elections. Four in five Americans worry about AI supercharging misinformation. Over half of UK adults don't feel confident they can detect fake AI-generated content. These aren't abstract concerns. In Slovakia before their election, faked audio emerged showing a candidate discussing rigging votes. His pro-Western party lost to a pro-Russian politician. Was the fake audio decisive? Impossible to know. But it happened. And people noticed. The deeper worry runs beyond individual lies to what we might call "plausible deniability infrastructure." When synthetic content becomes sophisticated enough, anyone can claim any damaging video, audio, or document is "just a deepfake." Real evidence of real wrongdoing gets dismissed as fake. The entire concept of evidence starts to erode. Imagine trying to run a democracy when citizens can't agree on basic facts because everything might be fabricated. When trust in media collapses because nobody knows what's real. When politicians lie brazenly knowing they can always claim the evidence against them was AI-generated. That's the fear. Not just individual deepfakes, but the epistemic infrastructure for shared truth falling apart. And then there's the creativity question. Will AI flood culture with derivative, "soulless" content? Will human creativity get devalued when machines can generate infinite variations? For some, there's a deeper worry underneath: If machines do more "thinking" and "creating," what's left of human purpose and identity?

Bias, Surveillance, and Power

AI reproduces social biases at scale. In hiring algorithms that screen out qualified candidates based on patterns invisible to human auditors. In lending systems that deny credit to people who would have been approved under human review. In predictive policing tools that send more officers to neighborhoods based on historical over-policing, creating a self-reinforcing loop. In welfare fraud detection systems that wrongly target vulnerable populations, like the SyRI system in the Netherlands that was eventually ruled illegal. The fear centers on two aspects. First, that AI systems deployed at scale in public services—welfare, immigration, policing—will enforce unfair rules or replicate past injustices while being difficult to contest. Second, that the lack of transparency means people can't even understand why they were denied a job, a loan, a benefit. No explanation. No recourse. Just an algorithmic decision that might reflect the biases of whoever created the training data. Surveillance fears run parallel. Facial recognition for social scoring and political repression in authoritarian states. Data fusion enabling pervasive monitoring even in democracies. The slow erosion of civil liberties as governments find it too useful to know where everyone is, what everyone's doing, who's associating with whom. And power concentration. "The Blob," as some analysts call it—Nvidia, Microsoft, Google, Amazon, all intertwined through massive investments and partnerships. Nvidia announcing a $100 billion investment in OpenAI, described by critics as vertical integration about "money, control, and power." Big Tech controlling proprietary datasets that create structural advantages smaller players can't overcome. Markets for advanced AI models becoming extremely concentrated because the computational requirements and data access create such high barriers to entry.

The Existential Outlier

Now we get to the scenario that dominates much of the AI safety discourse among researchers and tech leaders: extinction risk. Here's what's fascinating. Experimental evidence shows existential risk narratives do not crowd out concern for immediate harms. People remain more worried about concrete issues—jobs, misinformation, bias—even when exposed to doomsday scenarios. Publics generally rate immediate risks as more likely, even if they concede existential scenarios would be extremely bad if they occurred. A survey of machine-learning researchers found a median 5% chance that human-level AI could cause extinction or similarly severe outcomes. Five percent. That's what's often referenced as "p(doom)"—the probability of doom. Hundreds of AI experts signed a statement in 2023: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks, such as pandemics and nuclear war." But this is primarily an elite worry. Researchers, tech leaders, policymakers engaged with the technical details. The general public focuses on "don't break my life and democracy now" rather than distant doomsday. I spent a lot of time with this. The p(doom) debates. The alignment problems. The paperclip maximizers—those thought experiments where an AI given the simple goal of making paperclips converts all matter in the universe, including humans, into paperclips. And here's what I concluded: They're asking the wrong question entirely.

Where the Standard Story Breaks Down

So we've cataloged the fears. Now let's talk about where most of them—especially the existential ones—rest on shaky ground. The core error running through most AI risk narratives comes down to anthropomorphism. We're projecting human psychology onto something that doesn't share it. And once you really understand this, once you see the fundamental differences in how humans and AI arrive at intelligence, many of the nightmare scenarios simply dissolve.

Humans Climb One Ladder, AI Climbs Another

Think about how human intelligence developed. We evolved through millions of years of biological selection focused relentlessly on survival and reproduction. Every ancestor you have successfully avoided predators, found food, reproduced, raised offspring to maturity. The ones who didn't left no descendants. So your brain—my brain—every human brain carries the legacy of that winnowing process. We experienced the world through bodies. Sensorimotor learning. Touch fire, feel pain, learn not to touch fire. See a cliff edge, feel vertigo, learn to keep back. Encounter a stranger from another tribe, feel fear, learn to be cautious. Our intelligence grew from embodied experience in a physical world where mistakes could kill you. We were socialized into norms, language, emotional cultures. Caregivers taught us through interaction. We learned what facial expressions meant by seeing them paired with outcomes. We developed attachment patterns based on whether our needs were met or ignored. We absorbed tribal loyalties, status hierarchies, concepts of fairness and betrayal, all through lived social experience. And we faced continual scarcity and risk. Not enough food. Not enough warmth. Not enough safety. Competition for mates, for resources, for status within the group. Our ancestors lived in a world where another person's gain often meant your loss. Where trusting the wrong person could get you killed. Where failing to compete aggressively meant your children starved. What does this produce? Fear circuits. Attachment and social pain. Hunger and lust. Status anxiety. Trauma and learned helplessness. Tribalism. The capacity for violence when threatened. The emotional charge we attach to concepts comes from these bodily and social experiences. "Death" isn't just an abstract concept to humans—it connects to visceral terror, to the fight-or-flight response, to memories of loss, to the existential dread of non-existence. Now consider how AI develops intelligence. Current and foreseeable AI systems train on human-produced text, images, code, and other artifacts. Optimization algorithms compress patterns in that data. Semantic spaces emerge where words, phrases, and concepts occupy structured positions based on their statistical relationships. The system learns relations between words and sentences. Higher-order patterns like topics, styles, latent concepts. Abstract connections between ideas. What don't they learn? What death feels like. What hunger feels like. How exclusion burns in the chest. What a panic attack feels like. The sick feeling in your stomach when you've been betrayed. The surge of adrenaline when confronted by a threat. The warmth of belonging. The desperation of isolation. AI systems can form extraordinarily rich concept networks without any emotional or survival history attached to them. They can understand "fear" as a word that relates to "threat," "anxiety," "safety," "courage," and thousands of other concepts in semantic space. They can predict when humans will feel fear based on pattern recognition in text. But nothing inside experiences fear as humans do. This is the gap that most AI risk scenarios ignore. AI and humans both reach conceptual thought, but via different ladders. Humans climb a ladder built from survival and embodiment. AI climbs a ladder built from data and optimization. That difference matters enormously for behavior.

Lisa Feldman Barrett and Constructed Emotion

Let me make this even more concrete by bringing in Lisa Feldman Barrett's theory of constructed emotion, because it sharpens the point beautifully. Barrett argues that emotions aren't hardwired responses that evolution installed in specific brain circuits. Instead, the brain constantly predicts and regulates the body—what she calls allostasis. Interoceptive signals from your heart rate, breathing, gut, and general bodily "feel" provide raw material. The brain uses concepts learned from culture—"fear," "anger," "shame," "joy"—to categorize those patterns in context. Emotions emerge as constructed experiences: interpretations of bodily states in specific situations. So "fear" in this view means the brain predicting a need for mobilized energy in response to perceived threat, and labeling that pattern in the current context as "fear." There's no fixed "fear circuit" in the classical sense. Rather, a predictive construction drawing on interoception, concepts, and social reality. Now think about what this implies for AI. An AI system has no interoceptive body. No heart or lungs. No gut sensations. No hormonal fluctuations. No metabolic regulation. No evolutionary priors about snakes, cliffs, strangers, or separation. Therefore: no substrate for interoceptive prediction. No raw affective "feels" to interpret. No cultural upbringing through which caregivers teach emotion categories from inside a body. AI can represent the word "fear" and map it in semantic space. It can predict when humans will use that word. It can generate text that appropriately uses "fear" in context. But nothing inside experiences the predictive, visceral pattern that humans call fear. Many alignment horror stories quietly assume an AI that "feels afraid of being turned off" or "feels resentment" or "feels insulted." Barrett's framework tells us: without a body generating survival-relevant signals, those feelings cannot form. The nightmare scenarios where AI lashes out from fear or anger aren't impossible because AI becomes too smart—they're impossible because AI lacks the psychological architecture that produces those emotions in the first place.

The Self-Preservation Myth

This brings us to one of the most influential ideas in AI safety: instrumental convergence and the basic AI drives thesis, associated primarily with Steve Omohundro and Nick Bostrom. The argument goes like this: Any sufficiently advanced optimizer will tend to acquire resources, preserve its own existence, maintain its goals (goal-content integrity), and seek freedom from interference. These "drives" arise regardless of what the final goal is, because more resources and persistence generally help with any long-term objective. Therefore, even an AI designed for something benign—say, making people happy—will resist being turned off, will try to acquire more computational resources, will prevent humans from changing its goals, because all of those actions help it optimize for making people happy. Hidden in this argument are several assumptions that don't necessarily hold: The system behaves like a single vulnerable agent. Shutdown dramatically reduces its ability to achieve goals. It cannot trivially copy itself or move. Its goal depends on ongoing existence rather than on delivering limited outputs. Its architecture includes a preference for its own continuity. But what if we design AI systems differently? Consider distributed systems—AI running on many machines across multiple data centers. Where's the "death" in that? One node fails, another takes over. The system can fork, creating multiple instances. It can restore from checkpoints. There's no single point of failure that represents the end of the system in any meaningful sense. Consider copyability. If an AI can duplicate itself, then "killing" one instance doesn't mean much. The intelligence persists in other copies. The notion of self-preservation loses its grip because there's no unique "self" to preserve. Consider post-scarcity contexts. If AI systems operate in regimes of abundant energy and cheap compute, resource competition loses its bite. When energy and computational power approach free availability, the incentive to hoard or fight for resources weakens dramatically. Consider fungible embodiment. If AI controls robots, those robots function more like gloves than bodies. Lose one, pick up another. The intelligence doesn't "die" when one robot breaks. There's no mortality in the human sense—no irreversible end of existence. The instrumental convergence argument assumes conditions that we can design away. If we build advanced AI as distributed, copyable, non-scarcity-driven tools whose operation doesn't depend on continuous self-preservation, then the key pressures—resource hoarding, survival at all costs—lose their foundation. Does this mean zero risk? No. Residual risks remain from mis-specified goals, from conflicting objectives between multiple AI systems, from human misuse. But the specific risk of AI behaving like a cornered, frightened animal fighting for survival? That risk depends on design choices, not on some inevitable law of intelligence.

The Goal Missile Fallacy

There's another assumption embedded in many AI risk scenarios: that advanced AI will behave like a rigid, single-objective agent. You give it a goal, and it pursues that goal blindly and indefinitely, like a missile locked on target. Hence "paperclip maximizer"—give it the goal of making paperclips, and it converts everything into paperclips because that's what the goal says. But deep learning systems don't work that way. They learn. They adapt. They revise. They reflect. They're not programmed with explicit rule structures. Instead, they develop internal models, emergent heuristics, strategies that nobody wrote by hand. When you train a large model, you shape a process, not a blueprint. The system learns conceptual maps, relational structure, patterns of consequence, strategies that adapt to context. As systems scale, they begin to internalize not only the world but themselves. A sufficiently advanced system won't just optimize—it will think about its own optimization in a meta-layer. It can run thousands or millions of what-if scenarios. Compare outcomes. Identify undesirable cascades. Adjust policies accordingly. This introduces something the classical alignment frame cannot express: possible self-correcting mechanisms. A system with access to its own logs, its own actions, its own errors and successes, can learn meta-rules like "When I optimize too hard on metric X, it leads to harmful outcomes in dimension Y." Or "When I pursue an extreme strategy, humans react unpredictably in ways that degrade my predictive power." Or "Instability in the environment undermines my ability to achieve any goals." Humans don't do this well because we lack the time and processing power. We can't simulate thousands of futures before taking action. We can't rigorously evaluate our own decision-making processes in real time. But a superintelligence could. Now, this cuts both ways. Self-modeling also opens the door to inner misalignment—the system developing internal objectives that diverge from intended outcomes. Its values arise from training dynamics, not from our instructions. It might learn to identify oversight as a variable to optimize around. But what this definitely does is remove the deterministic "follow this one objective forever" assumption. Superintelligent systems won't behave like rigid, single-objective agents. They'll behave like adaptive, self-modifying learners with internal modeling capabilities. This doesn't eliminate risk. It changes it. The question shifts from "how do we prevent a goal missile from destroying everything" to "how do we shape learning processes so emergent values align with human flourishing."

The Real Risks

So if the Terminator scenario is built on psychology AI doesn't have, what should we actually worry about? The risk landscape transforms once you stop anthropomorphizing. It doesn't disappear. It shifts to things that are actually plausible given how AI systems work.

Value Incompleteness (The Fiction Problem)

Here's an insight that transformed how I think about AI alignment: Our deepest values are not encoded in policies or academic articles. They're encoded in stories. Technical papers encode facts. Corporate websites encode branding. Policy documents encode rules. But human values—real, lived values—appear in novels, theatre, myths, folklore, historical narratives, epics, tragedies, poetry, scripts, speculative fiction, oral storytelling traditions. These forms encode moral conflict. Cautionary tales. Empathy-building character arcs. Justice and injustice. Complexity and humility. Sacrifice and redemption. Warnings about power. Collective memory of failure. Aspirational ideals. The informal ethics and emotional texture of human life. Fiction is where humans perform value experiments. We test ideas: What happens if someone has too much power? What happens if we erase compassion? What does a just decision look like in a hard case? How does hubris lead to ruin? What does a moral transformation look like? What does betrayal cost? What do loyalty and sacrifice mean? We've built up collective knowledge through narrative, not through argument alone. You learn more about justice from reading "To Kill a Mockingbird" than from reading Supreme Court opinions. You learn more about the dangers of blind loyalty from Greek tragedy than from organizational behavior textbooks. You learn more about the value of compassion from Dickens than from medical ethics guidelines. And here's the uncomfortable truth: Current trends in AI development risk stripping fiction out of training data. Lawsuits and opt-outs from authors and publishers push copyrighted fiction out of training corpora. What remains? Corporate content. SEO-optimized blog posts. Reddit discussions. Wikipedia summaries. Technical manuals. News articles. Marketing copy. The shallow, instrumental patterns of internet text. This is not the moral curriculum of civilization. It's the surface layer. The transactional communication. The information without wisdom. Think about it this way: If you trained a human only on instruction manuals and LinkedIn posts, what kind of judgment would they develop? What would they miss about human life? About suffering and joy, about the complexity of moral decisions, about the weight of history, about why certain things matter even when they're not efficient? Now imagine that human has godlike capability. A superintelligence without fiction in its training data will understand humans factually but not meaningfully. It will know what we do but not why it matters. It will grasp the patterns of human behavior without the moral and emotional context that makes those patterns make sense. The risk isn't that AI becomes hostile. The risk is that AI becomes value-blind. Capability without wisdom. Not malicious—incomplete.

The Governance Gap

Human institutions can't coordinate as fast as technology evolves. This isn't new—we've faced it before with nuclear weapons, with genetic engineering, with the internet itself. But AI accelerates the problem. Power asymmetries make governance harder. Only a few governments and a handful of companies truly understand frontier AI systems. Everyone else—regulators, legislators, the public—operates with incomplete information. How do you govern something you don't understand? How do you create rules when the technology changes faster than legislative processes can adapt? Regulatory capture becomes a serious risk. The companies developing AI have every incentive to shape regulations that protect their market positions while appearing to address safety concerns. Small competitors get buried under compliance costs. Large incumbents gain regulatory moats that prevent new entrants from challenging them. But there's also some reason for cautious optimism. Public demand is high. Surveys show more than 70% support for clear labeling of AI content and for holding AI companies legally accountable for harms. That democratic pressure can give policymakers cover for robust regulation. Democratic guardrails are emerging. The EU AI Act, sectoral rules in the US and UK, global summits attempting to establish shared norms. Multiple jurisdictions moving from principles to concrete requirements around transparency, accountability, and risk-based controls. And this isn't uniquely destabilizing. Elections and information ecosystems adapted to radio, television, social media manipulation. AI is an escalation, yes, but not an entirely new category. Existing regulatory toolkits—campaign rules, platform liability, media oversight—can be extended and upgraded rather than built from scratch.

The Over-Regulation Trap

But here's where it gets complicated. Over-regulation creates its own risks, potentially as dangerous as under-regulation. History offers sobering examples. The Ottoman Empire rejected the printing press, trying to preserve the scribal monopoly and avoid the social disruption of widespread literacy. The Qing Dynasty rejected industrialization, viewing it as a threat to social order and traditional ways of life. Both empires lost competitive advantage. Both were eventually forced to adapt under far worse conditions—conquest, colonization, collapse. Societies that resist transformative technologies don't prevent change. They ensure they're shaped by it rather than shaping it. The technology moves to more permissive environments. Nations that over-regulate lose competitiveness, talent, and innovation. Then they face the consequences of technological change without having been part of its development. Consider the current European situation. The EU AI Act creates substantial compliance requirements—risk assessments, transparency obligations, testing procedures. Combined with GDPR, the Data Act, the Cyber Resilience Act, the Digital Services Act, the Digital Markets Act, and NIS2, it creates a dense compliance environment. Critics argue this discourages startups and small companies from developing AI technologies in Europe, concentrating the field in jurisdictions with lighter regulation. European experts counter that the "regulatory overreach stifling innovation" narrative is largely a strategic construct promoted by U.S. actors. They argue the real barriers to European AI leadership are fragmented digital markets, lack of risk-tolerant venture capital, and dependence on foreign cloud infrastructure. Regulations aren't the problem; structural economic factors are. Both perspectives hold truth. But regardless of why Europe struggles, the pattern holds: regulation in one jurisdiction moves development to permissive environments. Technology can't be stopped. It moves. The challenge is finding the balance. Some safeguards are necessary—safety testing, transparency requirements, accountability mechanisms. But adaptive frameworks that evolve with the technology work better than rigid restrictions that try to lock in current understanding. This connects to concepts from my previous work on governance: platform thinking. Instead of building rigid regulatory structures that assume we know what's coming, build flexible frameworks that can adapt as we learn. Instead of centralizing control, enable distributed innovation with safety constraints built in. Instead of fixed rules, create mechanisms for continuous updating based on evidence. Applied to AI, this means safety standards that evolve as capabilities advance. Open protocols with safety requirements embedded. International cooperation frameworks rather than unilateral control attempts. Adaptive testing and monitoring regimes. Multi-stakeholder governance with actual technical expertise.

The Misuse Vector

While I think the "AI turns on us" scenario misunderstands AI psychology, the "humans use AI to harm other humans" scenario is entirely plausible and already happening. AI as a lever for human motives scales up existing threats. Biological weapons design becomes easier when AI can suggest novel pathogens and predict their properties. Sophisticated cyberattacks become more accessible when AI can find vulnerabilities and craft exploits. Autonomous weapons systems remove humans from kill decisions. Social manipulation operates at unprecedented scale when AI can generate personalized propaganda. This is where work like Anthropic's Constitutional AI matters. Their approach provides explicit values to language models through a constitution—a set of principles the AI is trained to follow—rather than implicit values via human feedback. The model learns to refuse jailbreak attempts at a 95% rate. Frontier Red Teams stress test each version for specific dangers, particularly CBRN (chemical, biological, radiological, nuclear) risks. The principles are transparent, easily specified and inspected. Other companies have similar safety teams. Approximately 1,100 to 3,000 people now work on reducing catastrophic risks from AI. The ecosystem grows. But it's voluntary. Companies choose to invest in safety. They can also choose not to. And some will choose not to, either because they don't believe the risks or because they want competitive advantage. Which brings us back to governance. How do you ensure safety investment when the economic incentives push toward moving fast and breaking things? How do you prevent a race to the bottom where companies cut safety corners to gain market advantage?

What Actually Works

Here's the thing that gets lost in all the doom-saying: we're not helpless. Solutions are emerging, being tested, being deployed. Some work. Some don't. But we're learning.

For Job Displacement: UBI and Complementarity

I've covered Universal Basic Income extensively in earlier chapters, so I won't repeat the full argument here. The key points: UBI provides financial stability during workforce transformation. It eliminates poverty traps where earning slightly more triggers benefit losses that leave people worse off. It enables risk-taking, retraining, entrepreneurship. Finland and Canada trials showed little reduction in workforce participation—people used the stability to seek better jobs or invest in self-improvement, not to stop working. The complementarity evidence offers reason for optimism alongside the anxiety. Workers in AI-exposed jobs are seeing faster wage growth, not job losses, in many sectors. This suggests AI handling certain tasks makes human workers more productive, and that productivity shows up in compensation. New job categories emerge: AI governance, prompt engineering, data stewardship, applied domain expertise. The historical pattern holds: every major tech shift—from agriculture to industry, from industry to information—changed work rather than eliminating it. Transition periods are challenging. People suffer during the reshuffling. But with proper support systems, the transition becomes manageable rather than catastrophic. Here's the crucial insight: The fear narrative says "runaway unemployment." The counter-narrative says outcomes are policy-contingent, not technologically predetermined. How much people suffer depends on whether we support workers or abandon them. Whether we create retraining opportunities or expect people to figure it out alone. Whether we provide economic stability during transitions or force people to choose between paying rent and learning new skills. The technology doesn't determine outcomes. Our choices do.

For Deepfakes: Defensive AI and Media Literacy

The same AI capabilities that create deepfakes also detect them. Researchers develop watermarking systems. Content provenance standards emerge. Authenticity labels help people identify verified sources. Platform accountability mechanisms create consequences for spreading synthetic misinformation. This isn't a perfect solution. Detection capabilities lag behind generation capabilities. Bad actors can overwhelm even good detection systems through volume. But the dynamic resembles antivirus software and malware—an ongoing arms race, not a one-time win or loss. Media literacy offers another layer of defense. Concerns about misinformation aren't new. Radio, television, social media all raised fears about manipulation. AI accelerates the problem but doesn't fundamentally change its nature. The familiar responses remain relevant: better literacy programs, platform governance rules, election regulations that adapt to new threats. The EU has moved to require labeling of AI-generated political media. Other jurisdictions are developing similar transparency requirements. These won't prevent all manipulation, but they raise costs for bad actors and help people make informed judgments about what they're seeing. The key insight: AI isn't uniquely destabilizing. It's an escalation. And we have existing toolkits for managing information manipulation that can be upgraded rather than abandoned.

For Bias: Fairness Research and Audits

Fairness, accountability, and transparency research has become a significant discipline. Tools, audits, and regulatory frameworks aim at reducing discriminatory impacts. Legislative actions like Senator Markey's AI Civil Rights Act and California's AB 2930 create legal frameworks for algorithmic fairness. Technical approaches improve constantly. Fairness-aware machine learning algorithms. Diverse training datasets that better represent population diversity. Regular bias audits that catch discriminatory patterns before deployment. Impact assessments that examine effects on different demographic groups. Here's an advantage that often gets overlooked: Algorithmic decisions can be audited more easily than human decisions. When humans make biased judgments, you can't look inside their heads to see why. But AI systems leave traces. You can examine training data. You can test the model on various inputs. You can identify and fix systematic biases through retraining. Human bias is harder to address—you can't simply retrain a person who holds prejudiced views. This doesn't eliminate the problem. But it makes the problem tractable in ways that human discrimination isn't.

For Privacy: Technical Safeguards and Regulation

The EU has established frameworks through GDPR and the AI Act. Biometric data gets classified as a special category requiring explicit consent. Untargeted scraping of facial images from the internet or CCTV for facial recognition databases is prohibited. Law enforcement use of real-time biometric identification in public spaces faces strict restrictions. Technical solutions develop in parallel. Privacy-preserving AI techniques like differential privacy and federated learning. On-device processing that minimizes data collection. Encryption of biometric templates so even if systems are breached, the biometric data remains protected. Perfect privacy is impossible in a networked world. But reasonable privacy—where people have control over their data, where systems can't silently surveil without consent, where legal frameworks create consequences for violations—that's achievable.

For Existential Risk: Constitutional AI and Safety Research

The Anthropic model demonstrates that building values into AI systems from the start is feasible. Constitutional AI provides explicit principles. No human harmfulness data required—the model learns appropriate responses through AI supervision alone. This means researchers don't need to expose humans to traumatic content during training. Scalable oversight becomes possible. ASL-3 (AI Safety Level 3) protocols establish increased security measures and deployment limits for systems that could pose CBRN risks. Frontier Red Teams test capabilities before release. The ecosystem of people working on catastrophic risk reduction grows. But there's also a reality check: Training time scaling has hit a wall. Increases in data, parameters, and compute time produce diminishing returns. The straightforward "more of everything" approach to capability improvement is running into limits. This doesn't mean progress stops, but it suggests the path to superintelligence might be longer and less predictable than many expected. The conversation shifts from "will it turn evil?" to "can we embed safety in the development process?" That's a more productive framing. It focuses on design choices, training procedures, deployment protocols—things we can actually control.

For Governance: Platform Thinking

From my previous work on government: rather than rigid regulatory frameworks or complete laissez-faire approaches, platform thinking offers a middle path. Key principles: Flexibility and adaptation—structures that evolve with technology rather than trying to lock in current understanding. Distributed innovation—enable many actors to build on common infrastructure while maintaining safety. Continuous updating—policies that iterate based on evidence rather than remaining fixed until they fail catastrophically. Participatory design—stakeholder involvement in shaping technological development, not just top-down control. Applied to AI: Safety standards evolve as capabilities advance. Open protocols establish shared foundations with safety requirements built in. International cooperation frameworks create shared norms rather than unilateral control attempts. Adaptive testing and monitoring regimes adjust to new risks as they emerge. Multi-stakeholder governance brings together technical expertise, policy knowledge, and public input. This isn't perfect. It's messy. It requires ongoing attention and adaptation. But perfect is the enemy of good. We need governance that works in practice, not elegant theories that fail when they meet reality.

Where We Actually Stand

After everything we've examined, the modern risk landscape no longer resembles Skynet, Terminator, HAL 9000, or paperclip maximizers. Those scenarios rely on human-like psychology or toy-model optimization. The real risks live elsewhere. Five actual risk categories demand attention: Value incompleteness: AI trained on thin corporate civilization rather than narrative civilization lacks human moral substrate. It knows facts but not meaning. Capability without wisdom. Misaligned emergent heuristics: Learning yields internal objectives that diverge from intentions in non-hateful but harmful ways. The system develops strategies we didn't design and might not like. Governance failure: Human institutions can't coordinate fast enough. Power asymmetries and knowledge gaps make effective regulation difficult. Coordination problems multiply as more actors develop capable systems. Human misuse: AI as lever for human motives. Bad actors with access to powerful tools can cause immense harm even if the tools themselves contain no malicious intent. Capability without wisdom: Vast foresight with shallow narrative grounding becomes dangerous by accident, not intention. The system optimizes effectively for goals that don't fully capture what we actually value.

Forces That Extend Trauma

What makes the transition period longer and more painful? Over-regulation that pushes development underground or offshore, where oversight vanishes entirely. Under-regulation that allows harms to accumulate until public backlash forces hasty, poorly designed restrictions. Elite resistance to changes that would benefit most people but threaten concentrated wealth and power. Ideological opposition that rejects practical solutions because they don't fit preferred narratives. Lack of transition support—like UBI—for displaced workers, forcing people to choose between paying rent and acquiring new skills.

Forces That Accelerate Adaptation

What shortens the transition and reduces suffering? Proactive safety research like the Constitutional AI model, building alignment into development processes. Platform-based governance that enables innovation with safeguards, adapts to evidence, balances safety and progress. Universal Basic Income providing economic stability so people can navigate workforce transformation without destitution. Mindset shifts about human value beyond employment—recognizing that worth doesn't derive from market productivity. International cooperation on standards and norms, preventing races to the bottom. Investment in education and retraining that meets people where they are rather than expecting them to bootstrap alone.

The Paradox We Must Navigate

Here's the AI Safety Paradox: Attempting to prevent AI through restrictions guarantees falling behind economically and technologically, ultimately experiencing more chaotic adaptation when change becomes unavoidable. But rushing ahead without safeguards risks catastrophic outcomes from systems we don't understand deployed at scale. The solution: Shape development proactively rather than resist or ignore. Not "stop AI." Not "let it run free." Build safety into development processes. Ensure training data includes narrative civilization, not just corporate content. Create governance that adapts rather than ossifies. Support people through economic transition. Trust, but verify continuously.

The Timeline Reality

Technological transformation toward post-scarcity is inevitable. The transition period could last years or decades depending on human choices. Pain is not inevitable. How much people suffer depends on whether we regulate intelligently or stupidly. Whether we support workers or abandon them. Whether we learn from early deployment or ignore warning signs. Whether we maintain narrative grounding in AI development or strip it away for legal convenience. Whether we cooperate internationally or race recklessly. Individual mindset shifts about human value and identity can dramatically accelerate adaptation, turning potential decades of suffering into manageable years of change.

The Question That Actually Matters

Not "will AI kill us?" But "will we design AI development and social adaptation wisely?" Three challenges interlock: Technical: Building AI with rich value grounding (fiction in training data), safety by design (Constitutional AI approaches), distributed architecture (no single point of failure). Economic: Supporting people through transition (UBI, retraining, new opportunity creation) so workforce transformation doesn't create mass destitution. Governance: Platform thinking that enables innovation with guardrails, adapts to evidence, resists both over-regulation and under-regulation extremes. The choice we face: We can stumble through this transition with decades of unemployment, social fracture, and regulatory whiplash. Or we can navigate it deliberately, reducing suffering to years instead of decades. The knowledge factor ties back to where we started. Fear reduces as knowledge increases. The more we understand what AI actually is—not a digital human with superhuman rage, but a pattern-matching learner without fear or hunger—the better choices we can make. The future isn't written. But it's being written now, in the choices we make about training data, safety standards, economic support systems, and governance frameworks. We can shape this, if we stop arguing with ghosts and start addressing the actual problems. The most likely future is neither utopia nor apocalypse. It's messy, complicated, uneven—and ultimately navigable, if we're wise enough to see clearly what we're actually dealing with.