Chapter 3: The History of Artificial Intelligence

Summary

Chapter 3 traces AI development from ancient automata through present systems, establishing one consistent pattern: each generation underestimates the pace of capability expansion. The chapter uses historical evidence to argue that conservative predictions about AI timelines have proven systematically too conservative.

More importantly, the chapter demonstrates that AI doesn't need consciousness, understanding, or true reasoning to eliminate human expertise. Sophisticated pattern matching (what current systems do) suffices to absorb entire knowledge domains and perform them better than humans.

Key Arguments

The Turing test became irrelevant the moment we passed it: ChatGPT passes tests designed to detect human-level conversation, yet nobody claims it's conscious. The entire framework of consciousness-as-prerequisite collapsed
Expertise is just pattern recognition at scale: Humans accumulate decades learning patterns. AI learns those patterns from all publicly available data in weeks. The time advantage vanishes
Prediction gap proved asymmetrically: Pessimists who said "AI will never do X" proved wrong. Optimists who said "AI will do X by date Y" proved more often right than wrong. The conservative estimate consistently underestimated
Capability compounds faster than previous predictions accounted for: AI improving AI research, which creates better AI, creating feedback loops that compress timelines beyond what even previous optimistic predictions suggested

Key Concepts Developed

The commodification of expertise: What took decades to learn becomes transferable to machines in hours of training data exposure
Pattern matching as sufficient: Whether or not machines "understand" becomes irrelevant when their predictions match or exceed human accuracy
Capability lag in self-understanding: Humans built systems whose capabilities exceeded human ability to predict. We built the thing before we understood it

Historical Arc

The chapter traces: ancient automata (fascination with mechanical systems) → symbolic AI (if we encode rules explicitly, machines can reason) → expert systems (capture domain knowledge in rules) → connectionist approaches (let patterns emerge from data) → deep learning (scale neural networks, let patterns emerge) → large language models (scale to all text data, general patterns emerge).

Each transition happened because previous approaches hit limits. Each transition arrived faster than previous generations predicted.

What the Chapter Actually Argues

Conventional narrative: AI development is a story of steady progress toward increasingly capable systems.

What the chapter argues: The story is one of systematically underestimated capability emergence. The conservative position ("AI will never match human ability at X") keeps proving false on shorter timescales than even optimistic predictions suggested. Society should update its baseline expectations—the pattern suggests continued underestimation.

Evidence Used

Expert consensus from decades past: "Machine translation will never work" (proven false by Google Translate). "Machines will never play chess" (Deep Blue, 1997). "Machines will never recognise images" (AlexNet, 2012). "Machines will never generate coherent text" (GPT progression)
The capability timeline compression: GPT-1 (2018) to ChatGPT (late 2022) was faster than any previous major AI breakthrough to mainstream deployment
The empirical record: AI researchers' own timeline predictions (made in 2010) vs actual deployment (occurred by 2020)

Counterarguments Addressed

The chapter acknowledges that "AI is overhyped" in specific application claims whilst simultaneously noting that core capability expansion consistently exceeds sceptical predictions. It's possible both are true: specific marketing claims are overstated AND underlying capability has expanded faster than previous predictions.

Editorial Notes

This chapter succeeds by grounding abstract debates (will AI become conscious? Will AI match human reasoning?) in empirical historical pattern. The actual question the book cares about isn't consciousness—it's capability. And the capability question has been empirically resolved: machines can absorb and deploy human expertise faster than humans can learn it. Whether that represents "understanding" becomes philosophically interesting but practically irrelevant.

Manuscript Content

The text below mirrors the current source-of-truth manuscript at chapters/03-chapter-3.md (synced from the Google Doc on 2026-04-20). Treat this section as read-only reference; edit the chapter file, not this wiki page.

Chapter 3

Most people think of AI as a recent arrival, born in the age of smartphones and social media. Yet its roots reach deeper, predating even the first mechanical calculators and tapping into thousands of years-old ancient dreams of companions that think with us. Fast forward to the 1950s; over the last seventy years, the artificial intelligence community has worked to bridge the gap between possibility and understanding, striving to articulate a future that defies simple narratives. I find myself drawn to these earlier years of the AI story, where each generation reimagined the possibility of artificial minds through the lens of their era's understanding. Hero of Alexandria created Indiana Jones-style automated temple doors and mechanical theatres in the first century CE. The Banū Mūsā brothers designed stunningly beautiful automated musical instruments and water clocks in 9th-century Baghdad. Al-Jazari built remarkably sophisticated automated musicians and servants in 12th-century Mesopotamia. In Europe, Leonardo da Vinci sketched designs for a mechanical knight that could sit up and move its arms. Each inventor reached toward that tantalising horizon where human ingenuity might spark genuine machine intelligence, their creations embodying both the technical limitations and boundless aspirations of their times. The modern chapter began in the afterglow of World War II, when Alan Turing's unconventional mind transformed mathematical abstraction into life-saving reality. In the quiet halls of Bletchley Park, his insights into mechanical computation didn't just crack enemy codes, they demonstrated how machines could extend the boundaries of human thought. This opened doorways that reshaped our understanding of machine intelligence, leading to a gathering of optimistic thinkers at Dartmouth College in 1956, with fresh memories of machines that could perform seemingly impossible calculations. They coined the term artificial Intelligence, but more importantly, they crystallised a vision that had lingered in human imagination for centuries. The path moved forward in cycles of breakthrough and setback. The early years hummed with possibility. Machines solved mathematical theorems and learned to play games like chess and Pong. This early work pointed to the possibilities of human-like reasoning, solving of mathematical theorems to unlock new areas of science, expert systems to revolutionise business, and even autonomous machines to release us from mundane tasks. After 20 years of effort and funding, none of these possibilities emerged. In the 1970s, as the Cold War intensified, considerable hope was placed in English-to-Russian translation systems. It became clear that these rigid systems could not handle idioms, context, and semantic ambiguity, which make up the majority of human communication. The simple rule-based systems crumbled when faced with the intricate messiness of the real world. Shakey the Robot (developed by what’s now known as SRI International in the late 1960s and early 1970s) was an early AI-driven robot that used rule-based planning to navigate an environment. It could perform basic tasks like pushing boxes, but was incredibly slow and brittle. It struggled with real-world unpredictability, sensor noise, and minor environmental changes. The rigid rule-based planning made it impractical for real-world applications. This and other high-profile failures made the funding evaporate. Critics who had dismissed the entire enterprise as fantasy seemed vindicated. The field entered what we now call its first "AI winter". Stubborn, dedicated researchers continued their work away from the spotlight. The 1980s finally saw renewed enthusiasm as Japan launched its ambitious Fifth Generation Computer Systems, which found their way into hospitals and financial institutions, such as the Dow. Wall Street firms and investment banks began using AI-driven quantitative models for their trading activities. These models, often based on rule-based expert systems, attempted to identify patterns in market data to make buy/sell decisions. While primitive by today’s standards, they represented the first steps toward algorithmic trading. One of the most infamous examples of AI-like systems affecting the Dow was Black Monday (19 October 1987), when the DJIA dropped 22.6% in a single day. AI-driven “program trading”, which used predefined rules to trigger automated trades, played a role in accelerating the crash. Many firms had set up trading algorithms that automatically sold stocks when prices dropped to a certain threshold, leading to a cascade effect. Once again, the systems proved brittle, unable to learn or adapt. By the early 1990s, AI entered another quiet period of reassessment and refinement, “winter” number two. Modern AI emerged from the miraculous alignment of five separate technologies that all converged at the perfect time. ______________ The Five _______________

1 The internet

In the Cold War corridors of the US Department of Defence, engineers confronted a stark challenge: how might a communication system survive partial destruction? While citizens across the United States constructed backyard bunkers against nuclear threats, DARPA (Defence Advanced Research Projects Agency) dreamed up a decentralised network. From this American military crucible emerged ARPANET, sending its first fragmentary message across California, from UCLA to Stanford, on 29 October 1969. The system attempted "LOGIN" but managed only "LO" before crashing. That stuttering first step would eventually lead to the birth of the internet. Over the course of two decades, ARPANET evolved beyond its American origins, cris-crossing the world as we expanded its institutional connections and adopted TCP/IP as a communication protocol in 1983. While powerful in its connectivity, this resilient skeleton, born from American military paranoia, needed more: it lacked accessibility, usability, and a human interface.

2 The web

Meanwhile, across the Atlantic... In the winding corridors of CERN (Conseil Européen pour la Recherche Nucléaire), nestled on the Franco-Swiss border, British engineer Tim Berners-Lee confronted a different problem: information chaos. European scientists worked on separate computers, each with its own unique data storage methods. While others saw mere technical inconvenience, Berners-Lee envisioned something grander: a seamless web of knowledge spanning the globe. His 1989 proposal at this European laboratory started modestly, under the utilitarian name "Mesh". But something about that mechanical term failed to capture the organic, interconnected nature of his vision. When he later christened it the World Wide Web, he claimed with characteristic British understatement that it "just sounded good". Yet in that simple name lay profound poetry, a metaphor that captured both the delicate interconnections and vast scope of what would become humanity's shared digital tapestry. By 1991, Berners-Lee had created the first web browser and website, choosing not to patent his invention to encourage open adoption. The web transformed the internet from a resilient communication system into humanity's first global knowledge commons. The American-born internet and the European-conceived web created something remarkable: the largest, most accessible repository of human knowledge in history. While not universal or complete – vast swathes of human wisdom, particularly from oral traditions and non-Western sources, still await digitisation – it grows daily, fed by millions of contributors across the globe. From a crashed "LO" in California to billions of daily connections worldwide, this network has transformed how humanity shares, discovers, and builds upon its collective wisdom.

3 Language models

For millennia, language flowed exclusively through human minds and mouths. Like water through ancient riverbeds, our words carved paths of meaning that machines could not follow. Early computers excelled at mathematical precision but stumbled over the simplest conversations. They could calculate orbital trajectories, yet struggled to grasp why we call a table “table”. Computer scientists initially approached this challenge like cartographers mapping an unknown land. They charted careful rules: nouns go here, verbs there, commas in these precise locations. Yet human language still proved too wild for such rigid borders. Our words spill over boundaries, shift meanings, break rules – then remake them. The map could never capture the territory. Machine learning changed the game by embracing the beautiful chaos of language with a new idea: what if, instead of giving computers rules, we let them find patterns in data? Rather than imposing order from above, this new approach lets patterns emerge from below. Think of a child learning to speak – not through grammar books, but through immersion in a sea of words. The timing proved providential. The web had transformed human expression into digital form at an unprecedented scale. Every tweet, blog post, and digital book added drops to a vast ocean of text. From this abundance, machines began to drink deeply, discovering patterns humans had never consciously noticed. Computer scientists realised that, rather than manually defining language, they could train machines to recognise relationships between words simply by exposing them to massive amounts of text. These statistical language models marked a profound shift in the field of artificial intelligence. Rather than following rigid rules, machines now navigate probability spaces. Like a reader who anticipates the end of a sentence, they learn to predict what words might come next. Crude at first, these predictions hinted at deeper possibilities. At Google in the early 2010s, Tomas Mikolov and his team achieved a remarkable breakthrough. Their Word2Vec model transformed words into mathematical vectors, creating a landscape where meaning emerged from the relative position and distance of words. In this space, words clustered like stars in constellations of related concepts. The often-cited equation – King - Man + Woman = Queen – revealed something profound: machines had begun to grasp analogy, that most human of mental leaps (they had also begun to discover the concept of the patriarchy, but that’s a discussion for later). Perhaps most fascinating, these mathematical word-spaces showed similar shapes across different languages. The French word "chien" and the English word "dog" occupied parallel positions in their respective vector spaces. This suggested something profound about human language itself: beneath our surface differences lie shared patterns of meaning, mathematical echoes of our common humanity. The web had provided the raw material, digitising human knowledge into a form machines could digest. Yet what emerged went beyond mere data processing. These systems began to reveal the hidden geometry of human thought, mapped in the mathematics of meaning.

4 Attention and transformers

In the early days of machine learning, teaching computers to process language felt akin to training someone to read in the dark with a narrow flashlight. The earliest models, recurrent neural networks (RNNs), read text one word at a time, only able to see the current word and the few that came before it. If the sentence was short, this worked well enough. But the moment it grew longer – like a novel instead of a tweet – the model would start forgetting important details. A phrase at the end of a paragraph could completely change its meaning but, by the time the model reached it, the crucial earlier words had essentially faded from memory. In the mid-2010s, a few years after Mikolov’s work, a group of researchers at another part of Google wrestled with this challenge. They were working on trying to make computers translate languages more fluidly. One researcher, Ashish Vaswani, continually encountered the same issue: the current models simply lacked efficiency. He found them slow, forgetful, and they struggled with long-range dependencies. That’s when he and his colleagues had a breakthrough. Instead of treating text like a sequence of words that must be processed one after the other, what if the model could look at everything at once and decide which words mattered most? Rather than reading with a flashlight, what if the model had a floodlight, illuminating the entire sentence instantly and focusing only on the most relevant parts? This idea became known as the “attention mechanism”. With attention, a model didn’t have to process words in order, it could assign different levels of importance to words, dynamically adjusting its focus. In a sentence like, “The cat, which had been stuck in the tree for hours, finally jumped down when it saw its owner”, an attention-based model could recognise that “cat” and “its owner” were closely related, even though a long clause separated them. In 2017, Vaswani's team published Attention Is All You Need, introducing transformers, a revolutionary architecture that enables the parallel processing of text rather than sequential. This allowed models to scan entire documents simultaneously, grasping context far more efficiently than previous approaches. But there was a catch: they needed enormous computing power to work at scale. Without that raw power, they remained an elegant idea waiting for the world to catch up. At the time of their invention, the world was on the verge of something big, but not quite ready. The internet provided the infrastructure, the web gathered the data, transformers provided the architecture, but one last ingredient was missing: the sheer computational muscle to bring them fully to life.

5 Chips

In the 1990s, video games pushed computers to their limits. Each new generation of games introduced more detailed environments, more complex physics, and richer visual effects. Game developers required more power, but CPUs – the processing chips running their computers – proved too slow. Rendering 3D graphics involved an overwhelming number of calculations, with every tiny movement demanding precise mathematical adjustments. One company recognised the problem clearly: Nvidia. Founded in 1993, Nvidia specialised in building graphics processing units (GPUs), designed specifically to handle the massive mathematical workload of rendering games. Unlike CPUs, which process tasks sequentially, GPUs utilise parallel computing by breaking tasks into thousands of smaller calculations and solving them simultaneously. The parallel processing of these chips perfectly complemented the parallel processing of the attention transformers. For decades, artificial intelligence researchers believed that neural networks – models inspired by the structure of the human brain – could enable computers to recognise speech, process language, and even make decisions. However, training a deep neural network required feeding it enormous amounts of data, adjusting millions (eventually billions) of parameters, and running vast numbers of calculations. CPUs simply lacked the power to process this efficiently. Training a large model could take weeks or even months, making progress painfully slow. In the early 2000s, no one in the AI community took GPUs seriously – after all, what could gaming hardware offer serious scientific computing? In 2009, Geoffrey Hinton and his students at the University of Toronto challenged that assumption. They adapted their neural network algorithms to run on GPUs, testing them against traditional image recognition tasks. What once took weeks now took days. Deep learning had found its engine. This shift caught the attention of Nvidia’s CEO Jensen Huang, and they started repurposing their GPUs for neural networks. Rather than dismiss this unexpected use, Nvidia embraced it, beginning to optimise its hardware specifically for AI acceleration. The improvement appeared so dramatic that even the sceptics had to acknowledge its significance. Then came the moment that changed everything. In 2012, Hinton’s team entered the ImageNet competition, an annual contest where AI models competed to classify objects in photographs. Using their GPU-powered deep learning model, they dominated the competition, achieving a level of accuracy far beyond any previous entry. For the first time, neural networks outperformed traditional AI models. AI had struggled for decades, waiting for the right hardware to unlock its potential. The answer had existed all along – in video game technology. After the ImageNet moment, everything accelerated. Nvidia, once focused primarily on gaming, pivoted towards AI, optimising its GPUs for deep learning. Researchers realised that if GPUs could enhance image recognition, they could also power language models. Over the next decade, Nvidia's GPUs became integral to the evolution of AI. They powered the development of transformers and enabled language models to process entire sentences at once. They made it possible to train increasingly sophisticated models, such as Word2Vec, BERT, and ultimately, GPT. In 2018, the GPT-1 model demonstrated how 117 million mathematical connections – what researchers refer to as parameters – could capture subtle patterns in human expression. Just a year later, GPT-2 scaled this approach to 1.5 billion parameters, hinting at untapped potential. Then came GPT-3 in 2020, a leap so dramatic it seemed to bend the curve of possibility – 175 billion parameters working in concert to generate human-like text, a hundred-fold increase made possible by advances in computer chip design and parallel processing. When the public-facing ChatGPT arrived in late 2022, it transformed these technical capabilities into something profoundly accessible. Each iteration represented not just an increase in scale but a fundamental shift in capability, like watching evolution accelerated through a time-lapse lens. The specialised chips that enabled these breakthroughs had evolved alongside the algorithms, each advance in computing power unlocking new possibilities in model size and capability. An amazing discovery emerged as researchers began to process different languages through transformers: all the languages had the same shape! Imagine a cloud of words where words related to the same concept sit relatively close to each other and in a particular direction from each other. These patters duplicate across different languages. We saw the results of this as google and others went from translating 10 of the most popular languages to launching dozens of new languages, includiung fictional ones like Klingon and Elvish. There are so many examples of what has been unlocked. I love that at the Earth Species Project, Aza Raskin and his team began exploring whether these same principles might help us understand the languages of other species – the complex vocalisations of whales that have evolved over 34 million years, passed down through generations in songs that travel across ocean basins. If human languages, despite their surface differences, share a universal structure, might there also be patterns of meaning we share with other species who inhabit this same world? The mathematics that revealed the hidden geometries of human language might help us discover bridges of understanding across the species barrier. We find ourselves awkwardly navigating tools that seem simultaneously powerful and exceedingly unwieldy. Each new capability raises questions about how we might thoughtfully integrate these technologies into our lives and society. The pace of development has quickened dramatically, with new models and capabilities emerging almost weekly. What makes this moment different isn't just technical capability – impressive as it may be – but how AI has begun to weave itself into daily life. Previous AI winters came when the technology remained confined to research labs and specialised applications. Today's AI tools are already integrated into our daily lives. Who knows what model we will have reached and what abilities it will have when you read these words? As artificial intelligence demonstrates capabilities that transcend previous limitations, our collective imagination struggles to keep pace with reality. We find ourselves in the midst of transformation, yet our discussions circle through cycles of hype and dismissal, missing the deeper currents of change reshaping our relationship with technology. As the boundaries of possibility expand outward, people avoid imagining the profundity of these changes. To see what is possible, we must pay attention to what is happening around us. As Aza Raskin said, "Our ability to understand is limited by our ability to perceive. If we can throw open the apertures of what we can perceive, that throws open the aperture of what we can understand." I’m obsessed with the idea of what comes next. I wrote my first piece of software on an old Compaq portable computer. It was a beast of a machine that folded into something the size of a large piece of carry-on luggage. It had a clunky keyboard, a massive power cord, and a small pixelated screen with boxy green characters. My sister repeatedly pointed out that, although marketed as “the first portable computer”, my slight eight-year-old frame could barely pick it up. Swollen with defiance and determination, I dragged it across the living room carpet to the other side of the room, leaving a trail of disruption in the carpet fibres. I plonked it down, plugged it in, hurried it along as I exasperatedly awaited its long boot cycle. ![][image1] Asimov's The Caves of Steel had ignited my imagination around that time, where humanity's fear and distrust of robots played out against the backdrop of a far-future Earth. The novel's vision of the year 4000 – with its humaniform robots, indistinguishable from a flesh-and-blood human, just beginning to bridge the divide between artificial and organic intelligence – struck me as simultaneously wondrous and absurd. Why would it take two millennia more to create truly human-like machines? After ruining my mother’s carpet, this timeline seemed an affront to the possibilities I sensed humming within this beige box of possibilities that sat in front of me. I finally settled cross-legged on the floor before its phosphorescent green display and started a new program in BASIC (a very simple programming language that was a precursor to many we use today). I bashed away at the keyboard, fueled by a childhood conviction that I could collapse those thousands of years into a single afternoon of programming. My conviction to accelerate the future hasn't dimmed; it has evolved into something more powerful: a drive to illuminate the pathways of technological evolution through shared understanding. Where once I believed transformation could spring from solitary determination and lines of BASIC code, I now see how collective insight shapes the trajectory of innovation. Actual progress emerges from our shared imagination and collaborative exploration across all walks of life. As more voices join this dialogue, we expand our understanding of what's possible. Each shared observation and public discussion of emerging trends contributes to a deeper comprehension of humanity's technological future. This shared exploration helps shape what comes next, guiding our choices and innovations through the power of collective vision. By making these possibilities visible and examining them openly, we transform abstract potential into actionable insight. Rather than a single afternoon, within six weeks, I had created something that bore little resemblance to Asimov's robots: a crude, primitive program that barely functioned. Yet, in its very limitations, I found it perfect. I had written a cascading set of “if…then” statements that could “understand” what a user said and respond appropriately. The code was ugly, but the outcome was beautiful. I could converse with my machine. We find ourselves in a moment much like my first encounter with that Compaq computer: awkwardly navigating our relationship with a technology that promises to transform everything. Like that eight-year-old dragging a "portable" computer across carpet fibres, we're learning to move forward with simultaneously powerful and unwieldy tools. Our early conversations with AI echo those first tentative lines of BASIC code: experimental, imperfect, yet charged with possibility. The parallel goes deeper than just technological awkwardness. There's that same blend of optimism and uncertainty, the tension between wanting to rush forward and needing to proceed thoughtfully. Just as I once sat cross-legged before a green phosphor screen, convinced I could program my way to the future, society now sits at the threshold of artificial intelligence, both eager and apprehensive about the conversations ahead. I have this question: do you sit there and stare at the greenscreen of your future and wait to see what happens to you next, or do you try to shape what it can do for us? Let's explore what might happen if we take charge and start tapping those keys.