The Future of Machine Intelligence, with Causality

Happy the human who has been able to learn the causes of things.
- Virgil

We probably need new paradigms of artificial intelligence to achieve generalizable, efficient, and reliable machine intelligence. Humans are inconceivably better than current systems at a few critical measures of performance. We are extremely efficient at zero or one-shot concept learning. We develop reasoning principles that generalize. We run controlled experiments based on hypotheses. We spend time trying to figure out how the world works, and not just to fulfill an immediate objective. It is easy for us to use many different datatypes: image, audio, language, relational, sequential.

Deep learning is bad at all of these [1] [2]. It might need hundreds of thousands of frames and thousands of epochs to achieve good performance on a single task. When learned, you can’t generalize basic principles – even things like movement or physics – to a new environment. Most of machine learning misses the notion of controlled experiments in the first place. We build our systems given the data and problem scope, instead of being flexible to them.

My best guess is that human beings are unreasonably intelligent because we’ve developed efficient “meta-models” that we use to think about the world. To be clear, a meta-model is a model about how to update your models of the world (see more about “World Models” here: [3] [4]). They are to our thinking what different database types (relational, graph, timeseries, etc) are to data analysis: a way of storing types of information and making certain analyses easier or more natural than others.

I’ve come to call these meta-models the “Human Structure Family”. There are at least four I think are very important: (1) Causality, or the study of learning causal data generating processes through experiments; (2) Psychological Modeling, or modeling the models of other agents; (3) Relational Reasoning, or thinking of the world fundamentally as objects and relationships between them; (4) Learning to Learn, or a model of best processes for updating models. (Are there more? Email me.)

Amazingly, the Human Structure Family is probably not neurophysiologically encoded in our brains. Deep learning has an (increasingly justified [5]) neurophysiolgical basis. But it’d be difficult to point to the physical Causality Meta-model module in our squishy brains, or a graph network in our heads that models objects and their relationships. Yet they appear to be almost as fundamentally useful as deep learning, and far more efficient and generalizable. In that sense, they may express a highly efficient prior about how to divide the world.

Causality is the the HSF sibling that fascinates me the most. It seems to be a way of getting closer to the data generating process (DGP). If an intelligence can understand how the world works, it seems more likely that it will maximize its objective, but also learn general rules that transfer to other tasks or subtasks.

A good causal intelligence would be able to decide what experiments it should run to learn most efficiently; think about missing variables that could bias learning; have an object-relational understanding of the world; decompose the world into entities not based on properties but based on interactions with other entities. It would tradeoff reward maximization for experimenting out of curiousity to learn about how the world works, which is something we humans do. (I am extremely interested in the new wave of curiousity-driven learning [6] [7].)

So, how do we make progress on incorporating causality into our quest for intelligence? The most important thing we could do is abandon our dogmatic beliefs about causality. For too long causality has been a synonym for “causal inference”, and not performing it perfectly lands you into trouble with econometricians and statisticians. If we see causality as less of an identification task, and instead as a toolbox full of shiny, new instruments (ha) that help us learn approximately about the world, I think we have a good shot at accelerating our path to general intelligence.

All this ignores, however, the prime importance of having the right data in the first place. The true test of these ideas will be when we have a system that can go out into the world and collect the data it wants, rather than being bound by the problem spaces and data limitations that we humans create. This might be a terminator-esque robot body. More likely, it’s entirely software-based. Instead of walking around, it probably just has a human-like curiousity for knowledge and uses the internet to pursue its hypotheses and determines what data it wants to collect. Luckily, the trend of Sensors Everywhere on Everything™ will make autonomous data collection, problem generation, and learning a lot easier. Whatever framework you use will be determined by what’s best to learn and reason in a world of endless data and questions that non-human intelligences interact with.

While I’m not sure how we create the mythical “general intelligence”, I think taking inspiration fromwhat humans do behaviourally is a good first step.


If you’d like more resources on advances in the Human Structure Family, or on Causality, I recommend:

  1. Building Machines That Learn and Think Like People (and the DeepMind follow-up response)
  2. Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution (Judea Pearl, 2017)
  3. Causality by Judea Pearl
  4. Elements of Causal Inference – a hardcore primer to causality techniques in ML.
  5. Schema Networks: Zero-shot transfer learning using causal networks, compared to DeepMind’s DQN