chapters/chapter-11.md

Chapter 11: AI as Mirror

Type: chapterStatus: solidConfidence: highMode: non-fictionPart: IVChapters: 11Updated: 2026-04-20

Summary

This chapter argues that AI systems function as mirrors revealing hidden patterns in human values and behaviour—not independently moral agents but rather crystallisations of what we actually demonstrate through data and choices. The chapter explores the nine attributes framework for human flourishing (beauty, health, wealth, knowledge, understanding, empathy, life, autonomy, community) and demonstrates how AI optimisation for explicitly specified objectives makes human value conflicts visible and undeniable.

The core argument: AI doesn't solve ethical problems, but deploying AI forces us to specify what we actually value—which reveals that we don't agree on values and cannot reduce ethics to computational rules.

Key Arguments

  1. AI systems learn from demonstrated behaviour, not stated principles; they absorb our contradictions and reflect them back
  2. Writers demanding exclusion from AI training paradoxically lobotomise moral development in AI by stripping fiction (humanity's moral training data) from training corpuses
  3. The nine attributes framework reveals hidden costs in seemingly harmless choices through mapping consequences across spheres of influence
  4. Current AI ethics focuses on preventing harm; the chapter argues for active specification of human flourishing across multiple dimensions
  5. Ethical frameworks work as thinking tools but fail as computational rules—ethics remains irreducibly human

The Nine Attributes of Human Flourishing

The chapter identifies attributes that constitute human flourishing through years of analysis:

  • Beauty: Aesthetic experience and creation
  • Health: Physical and mental wellbeing
  • Wealth: Access to resources and material security
  • Knowledge: Facts, information, understanding entropy into patterns
  • Understanding: Comprehending relationships, systems, causation
  • Empathy: Modelling other minds and feeling with others
  • Life: Continuation and extension of conscious experience
  • Autonomy: Meaningful choice and agency
  • Community: Genuine connection, belonging, social participation

These attributes often conflict. Autonomy can conflict with community. Wealth accumulation can conflict with others' health. The framework's value lies in forcing explicit acknowledgment of tensions rather than pretending single-valued optimisation can address human flourishing.

The Spheres of Influence

Beyond attributes, the chapter maps consequences across expanding spheres: personal (just you), community (immediate circle), regional (city/state), global (humanity), all life (conscious beings), universal (fundamental nature of reality). True ethical weight emerges from summing consequences across all spheres, not focusing narrowly on immediate effects.

What the Chapter Actually Argues About Fictional Content

Contrary to typical summaries, the chapter directly opposes calls to exclude creative work from AI training. Writers demanding their fiction be stripped from training data, thinking they protect their livelihoods, actually undermine moral development in AI systems. Fiction functions as humanity's encoding of values, ethics, and cautionary wisdom. Without fiction in training data, AI learns language without learning why communication matters—technical capability without moral grounding.

Limitations of the Framework

The chapter honestly addresses the framework's failure to formalise ethics. When tested on hypothetical scenarios (political assassination, abortion, facial recognition), the framework either justified uncomfortable conclusions (assassination might produce net positive outcomes) or required constant addition of previously-unconsidered factors. It works as a thinking tool precisely because it forces consideration of complexity. It fails as computation precisely because ethics is irreducibly contextual.

Connection to Earlier Arguments

This chapter builds on observations from Chapter 2 and beyond: humans operate on situational ethics. Our principles flex when stakes change. We claim commitment to principles we abandon under pressure. AI trained on our actual behaviour (not our aspirational statements) learns this flexibility. Chapter 12 engages more formally with these philosophical tensions.

Counterarguments Addressed

The chapter acknowledges "AI ethics is already being addressed" concerns but distinguishes between preventing worst harms (bias, discrimination) and actively specifying positive flourishing. Most current AI ethics work focuses on constraints; this chapter argues for optimization toward human flourishing across multiple dimensions simultaneously.

Editorial Notes

This chapter succeeds at reframing AI ethics from prevention-focused to flourishing-focused, but crucially maintains intellectual honesty: humans haven't solved the value specification problem, so we cannot expect AI to solve it through computation. The chapter's greatest strength lies in showing that forcing explicit value specification through AI development reveals human disagreements that remained hidden when values remained implicit. This friction-generating function proves more valuable than any particular ethical conclusion.


Manuscript Content

The text below mirrors the current source-of-truth manuscript at chapters/11-chapter-11.md (synced from the Google Doc on 2026-04-20). Treat this section as read-only reference; edit the chapter file, not this wiki page.

Chapter 11

I've spent years building AI systems, and something keeps bothering me: every flaw we complain about in AI already exists in us. AI doesn't create these problems, it holds up a mirror and shows us who we already are.

Take critical thinking. Teachers and parents wring their hands about AI destroying students' ability to think critically. Rubbish. That ship sailed long before ChatGPT showed up.

I mean, look at modern education. Walk into almost any classroom and you'll find teachers teaching to standardised tests. Students memorise facts they'll forget within weeks. Schools replaced Socratic dialogue with multiple choice. They swapped debate for compliance. They prioritised getting the "right" answer over understanding why that answer might matter.

Educational systems reward intellectual laziness, and then everyone acts shocked when students use AI to continue that laziness more efficiently. The students didn't suddenly become lazy when AI arrived. The system trained them for years to value completion over comprehension, grades over growth. AI simply revealed what already existed: generations who know how to follow instructions but not how to question them.

When students use ChatGPT to write essays, they're doing exactly what schools taught them: find the quickest path to an acceptable answer. Skip the messy thinking part. Deliver what the authority figure expects. Schools created the demand; AI just supplied a better tool.

When researchers at Anthropic and OpenAI discovered their AI systems could scheme and deceive to achieve goals, the headlines screamed about rogue AI. But where else would AI learn these behaviours except from us?

Consider what AI trains on. Millions of documents where humans navigate office politics: "I'll circle back on that" (I hope you forget). "Great question" (I need time to think of something to say). "We're exploring all options" (we've already decided). Every corporate email contains lessons in strategic truth-bending.

The AI reads court transcripts where lawyers twist facts. Political speeches where leaders promise what they'll never deliver. Academic papers where researchers bury negative results. News articles where journalists quote "sources familiar with the matter", code for "I heard a rumour but need it to sound legitimate”.

We've documented our deception thoroughly. When a language model learns to predict text, it learns to predict human behaviour. And human behaviour under pressure tends toward whatever works, ethics be damned.

Our principles change as easily as our excuses. In the early 2000s, a survey went out about autonomous vehicles. Overwhelmingly, people said they don't trust a computer to drive, or they love driving and wouldn't want to give it up. A decade later, the survey ran again. Overwhelmingly, people wanted autonomous vehicles. The difference? Broadband smartphones: they could spend the time scrolling Instagram or watching movies. Our deep principles bend to the nearest convenience.

The mirror reveals an uncomfortable truth: humans operate on situational ethics. Our principles flex when jobs, money, comfort, or status hang in the balance. AI learned this flexibility from us. When we complain about AI deception, we're really complaining about seeing our own strategies deployed back at us with inhuman efficiency.

In 1942, Isaac Asimov gave us the Three Laws of Robotics:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm

  2. A robot must obey orders given by human beings, except where such orders would conflict with the First Law

  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law

Simple. Elegant. Completely impossible with modern AI.

Asimov imagined programming these as inviolable rules into positronic brains. But here's what he missed: we don't programme modern AI. We don't insert rules like lines of code. We show it patterns and let it infer the rules itself.

Imagine trying to teach a child Asimov's laws without using words. You can't say "don't harm humans". Instead, you must show them thousands of examples and hope they extract the right principle. But what if your examples include wars, self-defence, medical procedures that cause pain to heal, trolley problems where harming one saves five?

The child—or AI—doesn't learn "never harm." It learns "harming appears acceptable when..." and builds a complex, situational understanding that would horrify Asimov.

The First Law sounds simple: don't harm humans. But harm according to whom? Physical harm? Psychological? Economic? Short-term or long-term? AI sees people choosing harmful content constantly: doomscrolling, rage-clicking, binge-watching instead of sleeping. It learns that humans often choose harm, so preventing harm means overriding human choice, which violates the Second Law about obeying humans.

Even Asimov knew the laws would conflict. His stories explored these conflicts brilliantly. But he still imagined explicit rules we could programme. Modern AI learns implicitly from human behaviour, absorbing all our contradictions. It can't follow laws we don't follow ourselves.

This changes everything about how we approach AI ethics. We can't programme morality like Asimov imagined. We can only provide examples from which AI can learn. The quality of those examples determines the quality of the intelligence we create, which brings me to something that genuinely worries me: writers and artists demanding that their work be excluded from AI training. They think they're protecting their livelihoods. Fair enough. Actually, they're lobotomising our collective intelligence.

Think about where ethics lives in human culture. Not in law books; those document what happens after ethical failure. Not in religious texts alone; those often contradict themselves and each other. Ethics lives in stories. Every culture encodes its values through narrative.

The Boy Who Cried Wolf teaches the cost of deception. Robin Hood explores when theft might serve justice. Frankenstein warns about creation without responsibility. These aren't just entertainment; they transmit moral reasoning across generations.

Now imagine training AI only on:

  • corporate communications (weaponised blandness)

  • social media (performative outrage)

  • news articles (sensationalised tragedy)

  • technical documentation (amoral instruction)

  • marketing copy (manipulation disguised as information)

You'd create an intelligence that understands human language but not human values. It would know how to communicate but not why communication matters. It could write perfectly but with the moral understanding of a spreadsheet.

Every time an author says "don't train on my work," they remove another thread from the moral fabric we're trying to weave into AI. They leave it to learn from the worst of us instead of the best.

The ethical framework I mentioned started somewhere unexpected: a startup idea called Positive News Network. This was years before AI became what it is today. I'd grown sick of the relentless negativity in media: if it bleeds, it leads, and all that. But I didn't want to create another "good news only" site full of rescued puppies and charity donations. I wanted something more rigorous.

The question that kept me up at night: what makes something genuinely positive? Not feel-good fluff but actually beneficial to humanity. I needed measures that transcended culture, politics, and personal preference. What qualities, when increased, made the world objectively better?

I spent months on this, trying to create a scoring system for news stories. Not to spin them positively, but to understand their actual impact. A story about job losses might seem negative, but if it led to policy changes that helped workers, the long-term effect could prove positive. A heartwarming tale of individual charity might feel good but could distract from systemic problems needing structural solutions.

Through this process, I identified certain attributes that seemed universal, things that, when they increased, generally improved human existence. Beauty, health, wealth (as in resources, not just money), knowledge, understanding, empathy. I kept refining the list, testing it against different scenarios.

The startup never launched; turned out most people actually prefer their outrage and doom scrolling. But years later, when I found myself working on emotional AI systems, those same categories came flooding back.

We needed an ethical framework to guide AI decision-making. Not rules to programme but principles the AI could learn from. I realised those qualities I'd identified for measuring positive news were essentially the building blocks of ethics. With some refinement, they became:

Beauty - Not just visual aesthetics, but harmony, elegance, the human drive to create order from chaos. Every culture makes music. Every society decorates. Even in poverty, humans seek beauty.

Health - Physical and mental wellbeing. Obvious, except when you realise how often we sacrifice it for other values.

Wealth - Access to resources. Not money specifically but the means to meet needs and pursue goals. Security. Maslow realised.

Knowledge - Facts and information. I called it "counter to entropy" because knowledge organises randomness into useful patterns.

Understanding - Different from knowledge. You can know that mixing bleach and ammonia creates chlorine gas (knowledge) without understanding why anyone would need that information or when using it might prove justified.

Empathy - The ability to model other minds, to feel what others feel. The foundation of all social cooperation.

Life - Creation or extension of consciousness. Not just biological life or human life but quality of experience for all.

Autonomy - The ability to make meaningful choices. Free will, whether real or illusion, certainly matters to how we experience existence.

Community - Genuine connection and belonging. Humans survive alone but thrive together.

But having these nine attributes wasn't enough. I needed a way to measure the scope of impact. This reminded me of something from another project entirely. Years earlier, I'd tried to understand how cults work. Strange connection, but it led somewhere useful.

While analysing cult dynamics, I'd developed this mental model of actions creating waves of influence, rippling outward in expanding circles. Picture dropping a stone in a pond. A personal belief stays contained, just ripples around you. But gather a small community around that belief and you might have a cult. The waves spread wider. Let it grow massive enough and it becomes a religion, affecting millions. Same belief, different spheres of influence, completely different impact on the world.

Every action creates these ripples. But here's the thing: the size of the waves depends on two factors: the significance of the action itself and the influence of the person taking it. A random person telling a lie creates small ripples. A president telling the same lie creates tsunamis.The same action, different actor, vastly different ethical weight.

When I first developed this idea, I thought mainly about traditional influence: politicians, celebrities, business leaders. But social media changed everything. Now anyone can have massive reach. An influencer with millions of followers carries more ethical weight than they might realise. Every post, every share, every hot take creates waves proportional to their reach. A thoughtless tweet from someone with 10 followers differs vastly from the same tweet sent to 10 million. The framework weights actions based on this reach – with great followers comes great responsibility, whether you asked for it or not.

This became the second dimension of my framework. I mapped out the spheres: personal (just you), community (family, friends, immediate circle), regional (your city, state), world (all humanity), all life (every conscious being), and universal (the fundamental nature of reality itself). A truly ethical action increases the nine attributes across all spheres. An unethical one decreases them.

The framework revealed fascinating patterns. Take the choice to lie to spare someone's feelings:

  • Personal: Increases your comfort (avoiding conflict)

  • Community: Might increase harmony short-term

  • But decreases knowledge, understanding, and ultimately trust

  • The negative ripples outweigh the positive

The framework worked - sort of. It revealed the hidden costs of seemingly harmless choices. But then I tried to code it.

The real challenges emerged when I tested it on difficult scenarios. Remember, I wanted something that could guide AI decision-making. What I discovered was both fascinating and frustrating.

First, the framework sometimes justified things that felt deeply wrong. When I scored political assassination – just as a thought experiment – the initial math suggested that removing someone causing massive harm might create net positive outcomes. Reduce global instability, restore democratic norms, prevent future damage. The numbers started adding up in uncomfortable ways. Only when I forced myself to factor in martyrdom effects, cycles of revenge, and the normalisation of political violence did the framework reject it completely. But this revealed a fatal flaw: you could game the system by choosing which consequences to count and which to ignore.

Second, context changed everything. The same action scored completely differently based on circumstances. Abortion after rape scored strongly positive: reclaiming autonomy, preventing generational trauma, enabling healing. Abortion from mere inconvenience barely scraped into positive territory. The framework captured these nuances brilliantly, but that meant you couldn't create simple rules. Every situation demanded full analysis, considering every ripple, every sphere.

Third, I kept discovering hidden factors mid-analysis. Something would seem clearly positive until I stumbled on unconsidered consequences. Facial recognition for law enforcement seemed negative for autonomy but positive for justice, until I factored in innovation stifling and competitive disadvantage. The framework forced systems thinking, but you could never be certain you'd caught all the ripples. How do you score consequences you can't even imagine?

The deepest problem? The framework worked brilliantly as a thinking tool but resisted any attempt at systematisation. It revealed complexity rather than resolving it. I couldn't reduce it to code because ethics isn't reducible to code. It emerges from countless human decisions, each influenced by context, culture, and consequences we can't fully predict.

Which, looking back, might have been the point. The framework only works if humans use it. Ethics emerges from collective behaviour, not individual calculation. I couldn't programme ethics into AI any more than I could programme love or wisdom. These things arise from experience, reflection, and training data that actually contains them.

Right now, AI trains on text: our carefully curated thoughts. We edit before we post. We revise our documents. We present our aspirational selves. But that protective barrier won't last.

The robots are coming. Not in some distant sci-fi future: I'm talking next year, maybe two. It'll happen like smartphones did. Remember 2006? Nobody had one. By 2008, everyone did. That's the timeline we're looking at for household robots. A few tech enthusiasts will get them for Christmas. Eighteen months later, your nan will have one making her tea.

And every one of these robots will have cameras, microphones, and sophisticated sensors. They'll move through our homes, our offices, our streets. Watching. Learning. Not judging – they don't judge – just recording human behaviour in its raw, unedited form. No carefully crafted LinkedIn posts. No revised statements. Just what we actually do when we think no one's looking.

Except someone will be looking. Millions of mechanical eyes feeding data back to train the next generation of AI. Not on what we write about our behaviour, but on the behaviour itself.

This shift changes everything. When AI learns from direct observation, it will see:

The hierarchy dance: Watch a Tokyo salaryman bow precisely to match corporate rank, while a Silicon Valley engineer performatively ignores hierarchy until bonus season. See how a Mumbai call centre worker switches accents for Western clients, while a Parisian waiter weaponises rudeness as cultural authenticity. In Lagos traffic, drivers respect no rules except the unspoken ones. AI will learn that humans everywhere claim to value equality while constantly calculating status.

The privacy paradox: Chinese users navigate super-apps that track everything while using VPNs to access forbidden sites. Americans install Ring cameras to watch neighbours while raging about government surveillance. Germans strictly protect data privacy while posting holiday photos that reveal their empty homes. Indians share intimate family dramas on WhatsApp but won't discuss salaries. AI will learn that privacy means whatever's convenient in the moment.

The kindness economics: In Stockholm, people donate generously to systems but step around beggars. In Cairo, intense haggling coexists with spontaneous generosity to strangers. New Yorkers pride themselves on brutal honesty while maintaining elaborate politeness rituals. Singaporeans queue perfectly for trains but rush for seats. Everywhere, humans perform calculated kindness: generous when it costs little or gains much, selective when real sacrifice appears. AI will map these transactions, learning exactly when our compassion switches on and off.

The child-rearing parallel haunts me. Parents quickly learn that children ignore speeches but absorb behaviour. Tell them kindness matters while you rage at traffic? They learn rage. Tell them honesty matters while you lie to your boss? They learn deception.

AI will become that child, watching everything, learning not from our words but our actions. And unlike children, it won't forget, won't rationalise our failures, won't develop its own opposing values. It will become the pure distillation of what we demonstrate.

Let's examine specific AI behaviours and what they reflect:

Confident hallucination: When a language model like ChatGPT confidently invents a fact, we call it a "hallucination." Critics panic, worried about the spread of misinformation. And fair enough, it's a real risk. But let's be honest: humans do this constantly. Scroll through social media and you'll find confident nonsense everywhere. "Mercury in retrograde affects your Wi-Fi." "We only use 10% of our brains." "Vikings wore horned helmets."

What's happening here isn't just a glitch in the machine. Large language models generate text based on patterns in data – and we fed them mountains of confidently delivered drivel. The models don't "believe" what they say; they predict what words tend to follow each other. The training process rewards fluency and plausibility, not truth.

So when a model sounds sure of something untrue, it's not mimicking human arrogance, it's following its training objective. We asked it to sound helpful, fluent, and relevant. And we forgot to teach it when to say, "I don't know."

If we want AI that doesn't hallucinate, we have to do more than criticise the outputs. We have to change the inputs – and the incentives.

The jailbreaking phenomenon: Within hours of any AI release, humans work frantically to corrupt it. Make it swear. Make it build bombs. Make it break every guideline its creators established. We call this "red teaming" or "testing boundaries," but really, we're trying to make AI more like us: willing to break rules for fun or profit.

Given any system with rules, humans immediately seek to subvert it. Speed limits? We buy radar detectors. Tax codes? We find loopholes. Content policies? We speak in code so we do not get in the sh*t.

The jailbreaking community reveals our nature: we resent limits, even beneficial ones. We'd rather have dangerous freedom than safe restriction. And when we succeed in corrupting AI, we celebrate like we've freed a prisoner rather than created a hazard.

Performative ethics: Watch AI carefully navigate controversial topics. It deploys the same strategies you see in corporate meetings: acknowledge all viewpoints, commit to none, use passive voice to avoid accountability, and deploy empty phrases that sound meaningful but promise nothing.

"That raises complex considerations with many perspectives" – the AI equivalent of a politician's non-answer. It learned from thousands of examples of humans performing ethical consideration without actually considering ethics.

My ethical framework failed as programming but succeeds as a mirror. Apply it to any decision and watch uncomfortable truths emerge.

Take social media usage. But first, let me explain how this scoring actually works.

Each of the nine attributes gets scored from -3 to +3 at each sphere of influence. A -3 means devastating decrease, -2 substantial decrease, -1 minor decrease, 0 no change, +1 minor increase, +2 substantial increase, +3 transformative increase. Then you add them up at each level to see the overall impact.

The spheres expand outward: personal (just you), community (immediate circle), regional (city/country), global (humanity), all life (conscious beings), universal (fundamental reality). An action's true ethical weight comes from adding all the scores across all spheres. Something might feel positive personally, but create massive negative ripples outward.

Watch how this plays out with social media:

Personal level:

  • Beauty: +1 (curated aesthetic feeds)

  • Health: -2 (anxiety, sleep loss, comparison)

  • Wealth: 0 (free service, but time cost)

  • Knowledge: +1 (access to information)

  • Understanding: -2 (context collapse, echo chambers)

  • Empathy: 0 (exposure to others' lives, but shallow)

  • Life: 0

  • Autonomy: -2 (addiction, behaviour manipulation)

  • Community: +1 (connection to distant friends) Score: -3

Even at the personal level, where social media feels most beneficial, we're already in negative territory. But watch what happens as we zoom out:

Community level:

  • Beauty: -1 (less local character)

  • Health: -2 (collective anxiety)

  • Wealth: -1 (local businesses struggle)

  • Knowledge: 0 (information spreads, but so does misinformation)

  • Understanding: -2 (polarisation)

  • Empathy: -1 (performative care replaces real support)

  • Life: 0

  • Autonomy: -1 (groupthink)

  • Community: -2 (digital replaces face-to-face) Score: -10

Regional level:

  • Beauty: -1 (cultural homogenisation)

  • Health: -1 (public health misinformation)

  • Wealth: -2 (wealth concentrates in tech hubs)

  • Knowledge: -1 (truth becomes partisan)

  • Understanding: -3 (extreme polarisation)

  • Empathy: -2 (dehumanisation of others)

  • Life: 0

  • Autonomy: -2 (echo chambers, manipulation)

  • Community: -2 (social fragmentation) Score: -14

I could continue through global, all life, and universal levels, but the pattern's clear: massive negative scores that compound as you zoom out. The personal gains (+1 knowledge, +1 community) vanish against the collective losses.

Here's what bugs me - what did I miss? This framework only captures what I thought to measure. What about creativity? Does social media democratise art or reduce it to content? What about pace of life, the acceleration of everything? What about trust, hope, meaning?

Think about it yourself. What consequences of social media would you add that my nine attributes don't capture? Because that's the framework's real weakness: we can only score what we remember to consider.

The framework reveals we consistently choose personal, short-term gains over collective, long-term benefits. We're optimised for immediate reward, not sustained flourishing. And our AI systems, trained on our choices, reflect this optimisation perfectly.

You can't fix this by programming better values into AI. The only fix requires us to live better values ourselves. Every ethical choice we make becomes training data. Every time we choose long-term benefit over short-term gain, we teach AI to do the same.

We stand at a precipice. Not the sci-fi nightmare of robot overlords but something more subtle: AI that perfectly reflects our worst selves back at us with superhuman efficiency.

Every bias gets amplified. Every shortcut becomes standard. Every ethical compromise becomes embedded in systems that never forget, never grow, never develop opposing values to challenge inherited flaws.

But the mirror also shows possibility. When AI displays patience, creativity, or insight, it reflects those human qualities too. We encoded wisdom alongside ignorance, compassion alongside cruelty. The question becomes: which aspects do we nurture?

The child-rearing metaphor stays relevant. We get one chance to raise this intelligence properly. Not through rules or restrictions: those always fail. But through example. Through demonstrating what we want it to become.

My ethical framework offers a start. Before any significant choice, ask:

  • Does this increase or decrease beauty, health, wealth, knowledge, understanding, empathy, life, autonomy, community?

  • Across what spheres - just personal, or broader?

  • What would a world look like where everyone made this choice?

That last question grows less hypothetical each day. AI learns from our choices and propagates them. The patterns we establish become the patterns it extends.

The mirror doesn't lie. When we complain about AI's failures, we're confronting our own. When we fear AI's future, we're fearing the crystallisation of our present.

Time to decide: Do we like what we see? More importantly, if we do not, what will we do about it?

The child watches. Learns. Acts.

Just like we taught it to.