On the Future of Machine Learning, and Causality

I’m coming to believe more that we’re going to need new paradigms of artificial intelligence to achieve generalizable, efficient, and reliable intelligence. The most compelling new paradigms to me are the ones that capture the sort of structure we humans seem to be good at using:

  1. Causality.
  2. Psychological reasoning.
  3. Compositionality.
  4. Learning to learn.

For the sake of it, let’s call these four tools the Human Structure Family. I’ve given examples for each at the end of this post. This is not a complete list. Each of those four components are critical in the way that we think and reason about the world.

That is, we fundamentally reason about the world in terms of experiments, empathy, concepts, and quickly updating our internal mental models. Why did this happen? What happens if I do this? What does that person want? How are those two objects/concepts connected? What do I need to do to learn how to perform well in this brand new environment?

The forefront of current ML techniques seems to hit barriers that these approaches do not. Deep Learning is extremely powerful and expressive, but is inflexible once pre-specified, which leads to bad generalization.Reinforcement Learning is based around greater flexibility, but is extremely inefficient. If a reinforcement learner could run experiments in its environment, it could extract highly certain information in a single training step, rather than tens of thousands.

For these reasons, these methods are growing in popularity.

Do we really need, however, to base our techniques on top of these building blocks, or can we learn them automatically? That is, can we abstract from general mechanisms (deep architectures, for example) to recover causality, psychological reasoning, compositionality, and learning to learn? It’s a great question. I’m personally tempted to research if RNNs actually capture causal relations or if the learned compression from GANs or VAEs is similar to the causal data generating process. Research should both further the HSF’s inclusion into modern ML, and explore if we need to hardcode it in the first place.

Personally, I focus my research on causality because it seems the most fundamental of the four. But this has been tried before. The ML community focused heavily on Bayesian networks in the 1980s. Then, in the 1990s, Judea Pearl expanded on Bayesian networks with a set of tools to perform causal reasoning. [CITE: Causality by Judea Pearl] They quickly fell out of fashion as kernel methods and deep learning were better at capturing structure in high-dimensional, high-volume data. New advances in causality (to be saved for another post) put causality back on the same plane of efficacy in high-dimensional, high-volume settings.

Some opponents to this doing injustice to causality, which is a rigorous topic well-studied by statisticians and econometricians. My viewpoint is that we should borrow from its toolbox and relax some of the restrictions. Other opponents to these approaches argue that by imposing Human Structure we are abandoning the dream of automated intelligence and returning to hands-on GOFAI (“Good Old Fashion Artificial Intelligence” = hardcoding decision principles rather than learning them.) First, the definition isn’t accurate of new methods. Second, if it was accurate, it would also be accurate of neural networks.

As a last note, if we crack causality, we are much closer to cracking automatic scientific discovery. If we crack automatic scientific discovery, we might exponentially accelerate technological development for humans, which would be one of the most welfare-generating inventions in human history. Some companies are already doing this. 1

Further reading I highly recommend:

  1. Schema Networks: Zero-shot transfer learning using causal networks, compared to DeepMind’s DQN
  2. Building Machines That Learn and Think Like People (and the DeepMind follow-up response)
  3. Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution (Judea Pearl, 2017)
  4. Causality by Judea Pearl
  5. Elements of Causal Inference – a hardcore primer to causality techniques in ML.

Examples of the Human Structure Family

Causality – As we grow, we run little experiments to determine strong rules of thumb between general concept A, and general concept B. For example, we learn that our act of placing a hand on a hot stove leads to pain. However, we also experiment with interventions which tell us extremely valuable information about the nature of the relationship we’ve discovered. If we turn off the gas, we do not get burnt. Through causal reasoning, we only have to learn this once to abstract a causal relationship that is highly reliably generalizable to other situations (say, barbecuing).

Psychological reasoning – We also make decisions by predicting the intentions of others. When we drive a car, we try to predict what other cars will do. We think of the cars as being driven by human drivers, rather than cars as rule-based beings themselves, and so we might predict irrationality if a driver is particularly erratic, for example. For this reason, we might find it difficult to adjust to predicting the actions of self-driving cars.

Compositionality – We see the world not as a Convolutional Neural Net does (a pixel field that has layers of structure), but in terms of distinct concepts that we have learned as useful representations of the world. For example, we think of a car as the key object in explaining why a crash happened, rather than the groups of molecules across the range of the hood of the car. Capsule Networks seem to be based on this principle of compositionality.

Learning to learn – One of our key aspects is that we are good at generating or modifying our internal models of the world on the fly. For example, we learn to swim in a few lessons (i.e. the transmission of concepts that are causal in nature), and retain our understanding of swimming