From what to why: the rise of causal AI
By Idris Houir Alami. Edited by Anya Brochier, Louisa Mesnard and Marc Rougier.

We’re in the middle of a rising tide.
Over the past few years, we’ve seen the swell of generative AI, followed closely by the surge of agentic systems: autonomous workflows, AI copilots, and orchestration layers powered by LLMs. Hype is peaking, and capital is flooding in. Every company is trying to catch the wave.
But we’ve seen this story before.
In the late 2010s, autonomous vehicles were hailed as the next definitive step in AI. Robotaxis were expected to be ubiquitous by 2020. Companies like Tesla, Uber, Cruise, and Ford poured billions into the vision. But by 2025, Tesla’s robotaxis in Austin still rely on remote safety drivers, and many major players have scaled back or paused deployments entirely following safety incidents and regulatory pressure. It looked unstoppable, until it broke against reality.
That’s why being a good investor isn’t just about surfing what’s hot, it’s about scanning the horizon. Because for every wave that breaks, a new one is already forming. And while we can’t predict the future, we can recognize the early signs of something important.
We believe causal AI might be one of those next waves of innovation.
This article is our take on why causality matters, where it’s already in motion, and how it could shape the next evolution of AI by grounding it to understanding.
Prologue: the great misunderstanding of AI
AI today is powerful, but fundamentally limited. Not because models aren’t sophisticated, but because they’re built on a fragile foundation: that patterns in data (correlation) are the same as understanding how the world works (causation). But they are not, and that gaps matters:
Correlation ≠ Causation.
Let’s make this painfully clear, with a bit of dark humor.
When cheese predicts memes
Here’s a real, statistically robust correlation:

As Americans consumed more cheese, the popularity of the “this is fine” meme skyrocketed.
Absurd? Absolutely, to a human. But to a machine learning model?
It sees two features trending together. High correlation. Strong signal. So it learns that more cheese = more meme engagement, and thinks:
Why this is NOT fine
In a static environment, spurious correlations can appear to “work.” The model picks up on signals and optimizes accordingly. Error drops, confidence rises: everything looks fine. However, once the environment shifts, the model fails.
As soon as the cheese-meme relationship stops holding (as it inevitably will), the model starts producing garbage outputs, because it never understood why the pattern existed in the first place.
Spurious signals can do more than confuse a model, they can reinforce bias, discrimination, and systemic injustice.
Take large language models. In a recent study, researchers showed that LLMs consistently link gender to professional identity in stereotypical ways. Ask the model to generate a sentence about a nurse, and it’s likely to say “she.” Ask about a CEO, and it’s “he.” Not because the model understands these roles, but because it reflects biased patterns in the data it consumed.
Why? Because they don’t know what’s causal, what’s contextual, or what’s structural. They treat correlation as explanation. This isn’t an isolated issue, bias is pervasive across all large language models.
Giskard, an Elaia portfolio company, is tackling this problem head-on. Their mission is to systematically test LLMs for safety vulnerabilities, including bias, toxicity, and misinformation.
In their latest collaboration with Google Deepmind, they launched the Phare LLM Benchmark, a comprehensive evaluation framework that scores language models across a range of safety dimensions - bias included.

The results show clear progress, with some models exceeding 90% on stereotype resistance, but also highlight how wide the performance gap remains across providers.
And crucially: this metric only captures a narrow slice of the broader bias problem. Stereotype detection is just one dimension, bias can also appear in more implicit, structural, or contextual forms.
That’s the real danger. Without causal reasoning, AI systems risk entrenching ill-informed assumptions.
Why causal AI, and why now?
From correlation to causation
This is where causal AI enters the picture.
Instead of asking what usually happens when we see X, causal models ask what would happen if we did X. It’s a subtle but crucial difference, that turns passive pattern recognition into active decision-making.
As Daniel Kim, part of the CEO office at causaLens, put it during our interviews:
“Traditional machine learning answers ‘what’, causal AI answers ‘why’.”
Increasingly, it's the question businesses are asking. Companies are looking to simulate actions, evaluate interventions and make better decisions based on cause and effect.
AI that acts needs causality
Recent breakthroughs have brought the promise of real artificial intelligence closer than ever. But we are now already entering a new phase: the action era, where AI systems don’t just analyse the world, they act in it.
And that’s a big shift, because the moment we let AI systems take real-world actions, we raise the stakes. Acting based purely on correlation, without understanding how the world actually works (i.e. causality), can lead to biased or even harmful decisions.
Take autonomous agents: making decisions on our behalf, financial, operational, even medical. It is difficult to trust those decisions if the system doesn’t reason about the world it’s trying to change.
In robotics: interacting with uncertain environments and adapting in real-time. That requires understanding how actions lead to consequences, i.e. causal structure.
In short: correlation is a good measure of what happened in the past, but causality is essential for future decision-making.
From why to how
But what does it actually look like in practice? How does one “model causality”?
To answer that, we need to open the black box and look at the underlying principles, from statistical theory to the algorithms powering the first generation of causal AI startups.
What is causal AI, really?
Not a model: a mindset
Causal AI is not a single algorithm or architecture. It’s a way of structuring intelligence around a simple, hard question:
What is the effect of an action?
Where traditional ML models identify patterns in past data, causal AI is designed to simulate interventions and reason about alternate realities. Instead of asking, “What’s likely to happen next?”, it asks:
Causal questions i.e. “What caused this to happen?” (identifying parent-child relationships in the data)
Counterfactual questions i.e “What would have happened if we had done something else?” (reasoning about parallel scenarios for which no data exist)
Interventional questions i.e “What can I change to improve the outcome?” (choosing the best course of action)
Answering these questions requires a fundamentally different approach to learning.
At the heart of causal AI are two major schools of thought, both rooted in decades of statistical research.
(🧠 For those curious to dive deeper into the technical details, we’ve included additional explanations & resources at the end of this paper.)
Potential outcomes & The Neyman-Rubin framework
Imagine a patient takes a new drug and recovers.
Great news, but did the drug actually help?
Maybe they would have recovered anyway. Maybe the drug made no difference. Or maybe it helped a lot.
This is the core challenge of causal inference: we only observe what actually happened, in this case, recovery after taking the drug. But we’ll never know for sure what would’ve happened if they hadn’t taken it.
In causal terms, each patient has two “potential outcomes”:
One if they receive the drug (the treatment)
One if they don’t (the control)

The Neyman-Rubin Framework is built around this simple idea:
Every individual has two possible outcomes, but only one we observe. The other remains a counterfactual, an alternate timeline we can’t access directly.
Causal inference is about estimating the one we didn’t see, or haven’t seen yet.
We can’t rewind time and try both scenarios. But by observing enough people (some who got the drug, some who didn’t) and adjusting for differences (like age, health history, etc.), we can estimate what would have happened under the alternate scenario.
This lets us calculate things like:
Average treatment effect (ATE): Did the drug help overall?
Conditional average treatment effect (CATE): Did it help more for some people than others?
That’s the engine powering a wide range of real-world breakthroughs, from clinical trials and policy design to personalized medicine, and even marketing and product experiments!
Structural Causal Models & Directed Acyclic Graphs (DAGs)
Popularized by Judea Pearl, this school represents the world as a network of variables with directional arrows: A → B → C.
These DAGs encode your assumptions about which variables cause what, and enable intervention reasoning using the formal tool of do‑calculus.
Let’s take a simple example.
People who sleep with their shoes on tend to sleep poorly.
Does that mean the shoes cause bad sleep?
Unlikely. The real culprit is often alcohol: people who are drunk are more likely to pass out with their shoes on, and to sleep poorly.
So the true causal structure is:

Shoes and poor sleep are correlated, but only because we are comparing two groups that aren’t equivalent. Most people who sleep with their shoes on are drunk, and most who don’t are sober. The real cause of poor sleep is not shoes, it’s the drinking.
If we want to know what would happen if we forced someone to sleep with shoes on, written as:
We need to block the influence of the confounder (i.e drunkeness).
When we do that, we may still find a non-zero causal effect (sleeping with shoes on isn’t exactly comfortable), but the impact will likely be much smaller than what naïve correlation suggests.
This is the power of causal inference: it gives you a realistic estimate of effect size, not one distorted by hidden biases. And it does so without requiring a randomized trial, as long as your causal assumptions hold.
The DAG approach shines when you want transparency, interpretability, and the ability to reason about complex systems, especially in scientific or industrial settings.
From Theory to Practice: How causal AI actually works
In practical terms, building a causal AI system typically involves three stages:
Model the causal structure: using either domain expertise or algorithms that infer it from the data
Estimate the causal effect: using statistical estimators like ATE/CATE ([Conditional] Average Treatment Effect)
Simulate or optimize decisions: to find the best action based on causal reasoning
This can be done with open-source libraries like DoWhy or EconML, but it requires some technical skills.
That’s where startups like causaLens come in, making causal AI more accessible to enterprise teams.
The startup recently raised a $45M Series A to pursue their ambition to build AI agents that automate complex data workflows. These agents can perform classical ML tasks, but their true differentiator is their ability to embed and deploy causal AI algorithms.
They're powered by a proprietary causal inference library, developed in-house, which enables them to answer “what if” questions, simulate interventions, and optimize decisions in ways traditional ML simply can’t.
As they described during our interview:
“The idea of data scientist agents is what gets clients interested. But what keeps them is our causal engine, it unlocks better results, reveals new use cases, and gives them a reason to stay.”
By abstracting the complexity, causaLens is helping large enterprises take causal AI from research to deployment, across industries like finance, healthcare, and logistics.
A Shift already underway
Not just hype - But not yet peak
Causal AI isn’t some fringe academic dream anymore. It made Gartner’s official 2024 Artificial Intelligence Hype Cycle analysis, a key analysis that tracks the maturity and impact of emerging AI technologies.

While technologies like Generative AI dominate the headlines, causal AI is quietly gaining ground. The foundations are in place and tooling is maturing. Adoption is creeping up, especially in contexts where correlation has burned teams before. So, what's driving this acceleration?
Research & Big Tech: from theory to tools
In the past few years, causal AI has moved from academic theory to practical toolkits, driven by major investments from big tech and open-source communities. Here's a snapshot of who are laying the foundations of causal AI:
Enterprise Adoption: Quiet but Real
All these tools have lowered the barrier for real-world adoption.
Today, companies are quietly starting to put causal AI to work to power smarter, more accountable decisions. From marketing teams to product ops, causal reasoning is making its way into everyday workflows, often behind the scenes, but with tangible impact.
A 2024 industry survey of 400 senior AI leaders, conducted by Databricks and Dataiku, found that 52% of AI-forward companies were already using or actively experimenting with causal AI.
And the momentum is growing: in the 2023 edition of the same report, causal AI ranked #1 among technologies companies planned to adopt next, with 25% naming it as a top near-term priority.


In practice, companies are learning that AI can automate routine tasks, but causal AI empowers strategic decisions. Here are some examples:
eBay discovered that correlation ≠ ROI. Their models captured clicks and churn, but not business value. This led them to invest in uplift modelling to target users who are truly influenced by interventions.
LinkedIn relies on simulation-based experimentation to measure causal lift and optimise product changes for actual impact, not just statistical significance.
In our interview with Ruomeng Cui, Associate Professor at Emory University, she explained how causal inference helped unlock real business value in Amazon’s supply chain operations:
“We initially used standard ML models to estimate the value of last-mile delivery, but the causal approach was significantly better at identifying which customers actually drove incremental sales. Classical ML ignored the counterfactual, and missed the lift.”
Because in business, correlation is not enough when stakes and costs are high, leaders are turning to causal tools to guide decisions.
Deep Dive: Marketing
If there’s one domain where causal AI is already making a visible impact, it’s marketing. From campaign targeting to churn prevention, causal methods offer what traditional machine learning can’t: clarity on which customers to target, and why.
Let’s walk through the difference with a practical example.
Beyond churn models: targeting the right users
Marketing teams often use churn models to identify which customers are at risk of leaving. These models score users based on their likelihood to churn, and target high-risk segments with retention campaigns.
But here’s the catch: high churn risk ≠ high response to intervention.
Take the matrix below, which categorises customers by whether they would buy (or stay subscribed) with or without treatment (e.g. receiving an offer email):
Traditional ML targets users who look likely to churn, but that includes Lost Causes (who won’t stay no matter what) and Sleeping Dogs (who might churn because you intervened).
That means wasted budget, and in some cases, even a negative impact.
Causal AI flips the approach. As Naoufal Acharki, PhD in causal inference and founding engineer at Senzai AI, puts it:
“Instead of asking ‘Who is likely to churn?’, it asks: Who will churn if I don’t act, but stay if I do?”
Causal models estimate the treatment effect, finding the true delta between “treat” and “don’t treat.” In the matrix, this means identifying the anti-diagonal:
That’s where the Persuadables and Sleeping Dogs live, and where real marketing ROI comes from.
Replacing A/B Testing with Causal Tools
Causal AI doesn't just improve campaign targeting, it can go a step further by replacing A/B testing altogether, while solving some of its biggest flaws.
A/B testing has long been the standard for measuring marketing impact: randomly split a population into treatment and control groups, apply the campaign to one, and compare the outcomes. In theory, this should isolate the causal effect.
But in practice, randomization often fails to ensure balanced, comparable groups:
Samples might be skewed due to platform constraints
Users may self-select into one group (e.g. clickers vs. non-clickers)
The rollout mechanism (e.g. auctions, API layers, logged-in users) introduces hidden biases
In some situations, randomization isn’t simply flawed, it is not even doable:
Ethical constraints: you can’t randomly assign people to smoke or not, to study lung cancer.
Feasibility limits: you can’t randomly turn countries into capitalist or communist states, to measure GDP effects.
Physical impossibilities: you can’t randomly change a person’s DNA at birth, to study breast cancer risk.
Causal inference tools allow us to reconstruct the randomization: using methods like reweighting, stratification, or synthetic control, we can simulate an unbiased comparison.
Take Criteo, an Elaia portfolio company.
They operate in an auction-based ad marketplace, where it’s hard to cleanly isolate the effect of showing (or not showing) a given ad. This makes A/B testing inherently biased.
As explained by a researcher from the Criteo AI Lab during our discussion:
“Clients were constantly asking for proof of ROI. But randomised experiments were difficult due to the auction mechanics. So we turned to causal inference.”
By reconstructing what would have happened had the ad not been shown, the team could estimate true incremental lift**,** and demonstrate value with rigor.
But he also highlighted a broader opportunity:
“At Criteo, we had the tech muscle to do this in-house. But for many non-tech companies, it’s hard to implement. That’s where external providers come in.”
That’s exactly what Decima2 is building.
Traditional A/B testing forces you to randomly assign users to different variants, even if one performs worse, just to measure impact. Decima2 flips the model: you run your campaigns as usual, and they apply fictive randomization after the fact. This lets them estimate causal impact without compromising real users, resulting in the insights of an A/B test without the business cost.
“With A/B testing, you knowingly give part of your audience a worse experience. With causal inference, you don’t have to. You can learn from what you already did.” — Alexis Monks, CEO of Decima2
Some exciting trends
Causal LLMs
Researchers are now exploring ways to embed causal models into LLMs, with the goal of making large language models not just predictive, but explanatory.
This new wave of research includes:
LLM4Causal: a system that equips LLMs with causal tools, allowing them to answer user queries about interventions, counterfactuals, and treatment effects.
PyWhy LLM: developed by Microsoft, this approach integrates LLMs into the causal analysis pipeline, helping non-experts access expert-level causal insights through natural language.
CausalPFN: a novel transformer-based model that estimates treatment effects across many scenarios without retraining, combining the speed of in-context learning with the rigor of causal methods.
Causal RL
Another fast-growing area is Causal reinforcement learning (RL), where researchers combine causal reasoning with reinforcement learning to improve generalisation.
Traditional RL learns through trial and error, often overfitting to specific environments and requiring huge amounts of data. By introducing causal structure (i.e. which actions actually lead to which outcomes), Causal RL helps agents learn faster, adapt to new settings, and make more reliable decisions.
This could be especially useful in robotics, helping close the sim-to-real gap by teaching agents the true causal dynamics of the physical world.
Some must-read material include:
ICML’s tutorial on Causal RL: suggests the great potential for the study of causality and reinforcement learning side by side, introducing the name "Causal Reinforcement Learning" (CRL, for short)
Causal Influence Detection for Improving Efficiency in RL (NeurIPS 2021): introduces a measure of situation-dependent causal influence to guide exploration and enhance data efficiency in robotic RL tasks
Why do we care?
Causal AI won’t just make our models better, it will make them more useful. It enables deeper moats through decision-centric intelligence, helps reduce bias by questioning proxy signals, and moves us from raw association toward real-world decision-making.
But despite decades of theory, we’re still early. As Ivano Lodato, founder & chief-scientist of Allos (causal AI enable pharma company), said:
“Causal AI doesn’t need more math, it needs more people.”
Unlike the ML boom, which was unlocked by compute, the causal shift depends on awareness, talent, and adoption.
But let's be clear, that doesn’t mean causal AI is a solved science, far from it.
Or, to quote Yann LeCun:
“If humans were so good at causal inference, religion wouldn’t exist.”
There are still real limits: defining the right causal graph often requires strong assumptions, deep domain knowledge, or both. Validating those assumptions is even harder. And in many industries, like manufacturing or finance, where experimentation is costly or data is messy, deploying causal methods in practice can be complex.
That’s why we need to talk about it, research it, build tools, and support those doing the hard work of making it usable.
At Elaia, we look for what’s forming beneath the surface of the next wave of innovation. We don’t shy away from deeptech or complex scientific topics, we love to dive into them. So if you're building in causal AI, or anything you believe is the next wave, we'd love to hear from you.
Appendix
Potential Outcomes & The Neyman-Rubin Framework
Setting the stage
We consider a set of individuals indexed by i.
Each individual receives either:
Treatment: Tᵢ = 1
Control: Tᵢ = 0
Each individual has two potential outcomes:
Yᵢ(1): outcome if treated
Yᵢ(0): outcome if not treated
But we only observe one of them, the observed outcome is:
Each individual also has a set of observed pre-treatment features:
Estimating Treatment Effects
Imagine a patient who took a drug and recovered. Would they have recovered anyway without the treatment?
We’ll never observe both scenarios: only one is real. But the potential outcome framework tries to estimate the outcome under different treatments, even if they didn’t occur.
This is the foundation of modern causal inference in medicine, marketing, and economics — often expressed as:
Treatment Effect = What happened – What would have happened otherwise
In a formal way, we want to estimate the expected difference in outcome between treatment and control, given a profile X = x, we call this the Conditional Average Treatment Effect (CATE):
To do this, we define:
Meta-Learners
Because we never observe both Y(1) and Y(0), we use ML models to estimate these functions.
This framework allows us to model treatment effects across a population, or even personalize it at the individual level - using meta-learners like T-learner, X-learner, or R-learner, which blend causal logic with ML regressors like XGBoost or random forests.
Structural Causal Models & Directed Acyclic Graphs (DAGs)
Using do-calculus, two strategies apply:
Backdoor adjustment: If we can observe drunkenness, we adjust for it, compare only equally sober people.
Frontdoor adjustment: If we can’t observe it, but know it affects sleep only through shoes, we can use shoes as a mediator.
In both cases, DAGs help us see what to adjust for, and what not to, to answer “what if” questions from observational data.











