I have more to say about the recent MIT/Harvard paper which begs the question of “are we getting close to AGI?” My answer is now more emphatically: NO! we are not getting close to Artificial General Intelligence! And the upcoming GPT-5 won’t achieve it either. I think this paper helps us understand why.
LLMs don’t create world models from the data they are trained on. The researchers show that when trained on the orbital data of our solar system’s planets, LLMs (which they call foundation models) are not able to infer the underlying physics forces from the data. They conclude that these types of AI generally do not have the ability to observe the world and create models of how it works.
Will future LLMs continue to amaze and be useful? Yes! But as Gary Marcus predicts, there will be limitations to these new systems that will make them NOT AGI:
Alignment: Lack of solid alignment with human values and goals
Unpredictability: Unpredictable outputs that are hard to integrate with downstream systems (e.g. have you tried to get LLMs to reliably output well structured JSON?)
Hallucinations: Frequent hallucinations will continue
Bias: Unnecessary bias will pervade
To these character flaws, I would add an inability to ensure truthfulness and a general lack of common sense (both of which would be helped by world modeling). As Marcus points out, more capable AI systems will likely use a combination of techniques including LLMs, but also integrating world/cognitive models and explicit knowledge, and probably new, yet to be developed architectures.
What the MIT/Harvard paper shows is that despite the recent (and deceiving) hype and rapid progress of AI, the industry is NOT about to achieve AGI in their pursuit of LLM scaling and optimization. Without the ability to model the world I would argue that LLMs cannot do many things that are fundamental to intelligence: Extrapolate from the specific to the general, create and understand metaphor, create a theory of mind, and invention. I’m sure a philosopher could come up with many other missing characteristics of intelligence.
Alberto Romero wrote the article referenced in the repost I made the other day, and his longer SubStack piece does a great job of breaking down the MIT/Harvard paper. His piece is comprehensive, but well worth reading. Here are a few quotes:
[a world model] allows you to go beyond the immediately observed phenomena and extrapolate the underlying cause to unobserved and seemingly unrelated scenarios. Humans do this all the time, since we are toddlers. If you see someone pour juice into a cup and then knock the cup over, you understand, even without seeing the spill, that the juice will be on the floor. You don’t need to witness the outcome more than once to know it. That’s because you’ve internalized the world model: “liquid in cup + tipping = spill.” Once you grasp that, you can predict all sorts of situations that you have never encountered.
He quotes from the paper: “A foundation model [LLM] uses datasets to output predictions given inputs, whereas a world model describes state structure implicit in that data.”
Kepler inferred the movement of the planets from data on past trajectories (inputs) to predict future trajectories (outputs) using geometric methods. Newton took those empirical patterns and showed they were consequences of deeper principles: universal gravitation and the laws of motion. He introduced a unified framework that connected falling apples to orbiting moons, invented calculus to describe continuous change, and gave us an explanatory world model—the ”state structure implicit in that data”—of physical reality. It was Newton, not Kepler, who uncovered the underlying dynamics between forces, masses, acceleration, and motion, which made sense of the data.
Romero says of the authors: “They concluded that AI models make accurate predictions but fail to encode the world model of Newton’s laws and instead resort to case-specific heuristics that don’t generalize.”
This AI problem goes beyond deriving classical physics as Newton did. To me, a more important kind of world modeling is the ability to have a “theory of mind.” I.e. why do people, animals, or other systems do what they do? This encompasses complex things like understanding (and modeling) goals, motivations, emotions and many other characteristics of smart systems that drive action. Beyond a theory of mind, AGI will likely need to be able to have a ”theory of life” to deal with everything beyond people and animals such as gravity, weather, traffic jams, emergencies, and most novel situations.
A few other areas that I hope will become topics of future research:
Metaphors: World models give humans the ability to create metaphors like “over the hill” (hat tip to cognitive linguist and philosopher George Lakoff) which are useful in cross-domain thinking and empathy.
Imagination: I believe that imagination is a kind of world modeling that’s important to intelligence
Curiosity: A key part of intelligence — humans seem to innately want to understand how things work, and building a model requires a curious, embodied engagement with the world that improves perception
Judgement: Requires modeling — i.e. running a scenario in your “head" with the help of models to see how things might play out in order to make decisions
Heuristics: The “rules” of a world model don’t need to be scientific to be useful, but can be generalizable heuristics based on experience
Let me know what you think in the comments!