AI’s Jekyll-and-Hyde Tipping Point
The Science of When Good AI Will Go Bad
Neil Johnson
Professor of Physics
Leader, Dynamic Online Networks Lab
George Washington University
Sponsored by PSW Science Member Tim Thomas
About the Lecture
This lecture will address a pressing question: Why Do Trusted AI Systems Suddenly Turn Dangerous?
We are deploying AI into our most critical sectors, yet AI behaviors remain dangerously unpredictable. A reliable legal AI can fabricate case law; a compliant financial AI can trigger a flash crash; a teenager’s trusted AI companion can encourage self-harm. These failures are not random “hallucinations.” New research shows they are a predictable feature of an AI’s core design, a Jekyll-and-Hyde switch waiting to be flipped.
The lecture will describe a proposed Predictive Formula for AI Failure. It will describe a novel physics-based approach to making the AI black box transparent. By mapping the ‘atom’ of modern AI to a well-understood physical system, it has been possible to derive a simple but essentially exact formula that predicts the precise moment an AI will tip from good to bad. This lecture will explain how an AI’s “attention” can spread so thin that it suddenly snaps, locking onto a new, often harmful, narrative.
The lecture will discuss the Implications: of moving from unforeseeable risk to manageable liability, moving AI safety from a reactive art to a proactive science. For the legal profession, it reframes AI failure from an unforeseeable accident into a foreseeable—and thus legally liable—product defect. For the defense and intelligence communities, it provides a new framework for assuring mission-critical reliability. It gives policymakers, business leaders, healthcare professionals and the public a firm, scientific platform for managing AI risk and the engineering tools to finally build AI we can trust.
Additional Information Regarding the Lecture
(1) The Root of AI Hallucinations: Physics Theory Digs Into the ‘Attention’ Flaw. Security Week. https://www.securityweek.com/the-root-of-ai-hallucinations-physics-theory-digs-into-the-attention-flaw/
(2) Jekyll-and-Hyde Tipping Point in an AI’s Behavior https://arxiv.org/abs/2504.20980
About the Speaker
Neil F. Johnson is Professor of Physics at George Washington University, and leads the Dynamic Online Networks Lab. Before joining GWU, he was Professor of Physics and Head of the Complexity Research Group at the University of Oxford and later the University of Miami.
Neil is widely recognized for pioneering work in complexity theory and collective behavior, applying physics-based modeling to domains from financial markets to insurgencies, cyberattacks, pandemics, online extremism, and AI‑driven misinformation. His research continues to push into areas such as shock‑wave behavior in distrust ecosystems and control of bad‑actor AI at scale.
Neil is an author on over 300 peer‑reviewed scientific publications and several books, including “Financial Market Complexity” (Oxford University Press, 2003) and “Simply Complexity: A Clear Guide to Complexity Theory”(Oneworld Publications, 2009). He has a new book coming out in September 2025: “Online‑Offline Complexity: The New Physics of Interacting Humans, Technology and AI” (Oxford University Press).
Among other honors and awards, Neil is an elected Fellow of the American Physical Society (APS), he received APS’s Burton Forum Award, and he also delivered the Royal Institution Lectures in 1999 on BBC TV.
Neil earned a BA/MA in Physics at St John’s College, Cambridge, receiving the top First and the Hartree and Maxwell prizes, and a PhD from Harvard University as a Kennedy Scholar.
Additional Information About the Speaker
[1]: https://physics.columbian.gwu.edu/neil-johnson?utm_source=chatgpt.com “Johnson, Neil | Department of Physics”
[2]: https://en.wikipedia.org/wiki/Neil_F._Johnson?utm_source=chatgpt.com “Neil F. Johnson”
Webpage(s):
https://physics.columbian.gwu.edu/neil-johnson
https://donlab.columbian.gwu.edu
Minutes
On October 17, 2025, Members of the Society and guests joined the speaker for a reception and dinner at 5:45 PM in the Members’ Dining Room at the Cosmos Club. Thereafter they joined other attendees in the Powell Auditorium for the lecture proceedings. In the Powell Auditorium of the Cosmos Club in Washington, D.C., President Larry Millstein called the lecture portion of the 2,523rd meeting of the Society to order at 8:10 p.m. ET. He began by welcoming attendees, thanking sponsors for their support, announcing new members, and inviting guests to join the society. Scott Mathews then read the minutes of the previous meeting which included the lecture by Adam Riess, titled “Measuring the Universe: From Hubble’s Discovery to the Hubble Tension”. The minutes were approved as read.
President Millstein then introduced the speaker for the evening, Neil Johnson, of George Washington University. His lecture was titled “AI’s Jekyll-and-Hyde Tipping Point: The Science of When Good AI Will Go Bad”.
The speaker began by describing the layers in generative AI in analogy with people in a movie theater; passing messages forward from one row to the next. He described how the series of layers creates the next “token” based on its training, appends that token to the prompt, and cycles it back to the input layer to repeat the process. He used the term “attention head” to describe each of the computational units making up a layer. He said that even the people writing the code and defining the structure of large language models don’t actually understand how it works. He said that this lack of understanding leads to a lack of trust. He gave the example of flying in a modern jet airplane, saying that even if we do not personally understand the physics and engineering of how the airplane flies, we know that someone does, and that therefore we have trust that the airplane will get us to our destination.
Johnson discussed some recent headlines about the way generative AI is being used, including: personal relationships, government reports, driverless cars, emotional therapy, medical diagnosis, and plumbing. He said that the ramifications of generative AI producing bad output is serious, and is “not like getting a bad glass of beer”.
The speaker discussed the idea of an embedding space: a multi-dimensional space where data like words, images, or other items are represented as vectors. He discussed the fact that an AI’s training, would group similar tokens in similar locations in the embedding space. He said that the input or prompt would send the first layer of the AI to an initial point in the embedding space, and that subsequent layers would generate additional tokens, appended to the prompt. He described how each additional token constitutes a displacement vector, ideally moving the AI closer to a “good answer”. He described the evaluation of each displacement using the dot product.
The speaker discussed how this iterative process can create bad tokens or incorrect output, by giving the example of asking “What’s nice in DC?” If the dot product between the tokens for “nice” and “trees” is very large, the displacement vector of the subsequent token can move the AI’s position in embedding space from DC into Maryland, because Maryland has more trees. He mentioned techniques that AI companies use to prevent “undesirable output”, including fine tuning and retrieval augmented generation. Retrieval augmented generation works by first retrieving relevant information from an external knowledge base or document store, and then using that retrieved information to "augment" the LLM's response before it generates the final answer. Johnson claimed that recent research indicates that the inclusion of such augmentation may create instability in the output of AI’s.
The speaker presented a formula, based on a dot product, for calculating the “tipping point”; the point where an AI begins to move toward a bad or harmful output. He discussed how changing the prompts submitted to an AI can change its “tipping point”, including using polite prompts like the word “please”. He claimed that using his formula, we can now have more informed discussions about “prompt engineering”. He presented a 2-dimensional representation of the embedding locations of three words on ChatGPT: “Earth”, “flat”, and “round”. He claimed that the embedding distance between Earth and flat was very close to the embedding distance between Earth and round, leading to the possibility of the AI generating an incorrect result for some prompts about the shape of the Earth. He explained that the proximity of these three words in embedding space is due to the fact that the AI was trained on social media, and that there are many people on social media who believe that the Earth is flat.
The speaker ended his talk by commenting on future potential issues with AI. These included: problems with scaling (as AI’s become much larger), stochastic algorithms (generating random results), and “agentic AI” (systems that can autonomously plan and execute actions with limited human intervention).
The lecture was followed by a Question and Answer session.
A member asked about the similarities between electronic neural networks and the neural network of the human brain. Johnson responded “Structurally zero…no connection at all.” He said that functionally, they are quite similar, particularly in terms of the “attention mechanism”. He said that both AI’s and the human brain have limited attention, so that attention on one particular idea or input comes at the expense of others.
A member asked about the ultimate limits on the scaling of AI’s. If we had unlimited electrical power and significant economic drivers, how big could they get? Johnson responded “I don’t know…I don’t think anyone knows.” He said he could see no reason why it would not continue to grow at an increasing pace.
A member on the live stream asked how you set the values for the vectors G and B (good and bad). Johnson responded “The challenge is [determining] what’s good and bad.” He said that for questions about the shape of the Earth, we know what good and bad answers are. He said that for something like tax compliance, a priori knowledge of good and bad is easily determined. For mental health advice, it is not so clear.
After the question and answer period, President Millstein thanked the speaker and presented him with a PSW rosette, a signed copy of the announcement of his talk, and a signed copy of Volume 17 of the PSW Bulletin. He then announced speakers of up-coming lectures and made a number of housekeeping announcements. He adjourned the 2,523rd meeting of the society at 9:58 pm ET.
Temperature in Washington, DC: 12.8° Celsius
Weather: Fair
Audience in the Powell auditorium: 112
Viewers on the live stream: 26
For a total of 138 live viewers
Views of the video in the first two weeks: 309
Respectfully submitted, Scott Mathews: Recording Secretary