Is it causality? Or is it only correlation?

2020-01-15_MarkEvans.jpg

When two or more system variables appear to strongly correlate, is there also a cause and effect relationship?  This question has daily as well as engineering applications. With the COVID pandemic raging and the National Election season running in high gear, we are inundated with a flood of data about politics, social systems, financial data, scientific data, health systems, and left to sort through it all--usually on our own.

Which variables drive and which variables are driven?  Which variables both exhibit the same behavior and look related but are co-incidentally both directly related to the same, other influencing factor.  Cause and effect can be extremely difficult to pin down without the benefit of some scientific way to deal with and explain all this data.

There is a generally known maxim that “correlation does not imply causation”.  Another way of saying this is just because two variables are closely related and share similar behavior doesn’t necessarily mean that one is caused by the other.  The classic example is the rooster that crows at the first gleam of light does not cause the sun to rise.  The rooster crowing and the sun rising are strongly correlated, but the rooster does not cause the sun to rise.  Further empirical proof has shown that artificial light will stimulate a rooster’s crowing and the sun will fail to rise.

All the national polling being conducted and reported, along with the speculation and predictions accompanying it got me to thinking…   How DO they make a model using all this data that makes sense?

TheBookofWhy_onTable.jpg

Modeling Based Systems Engineering is directly related.  MBSE has emerged as a most effective way to model and build complex systems.  The methods and the use of MBSE as the go to method for designing and building today’s complex systems are gaining widespread acceptance.

Historically, statisticians (and mathematicians in general) have argued over what methods should be constructed to provide a way to represent/indicate causal relationships with a statistic that could be computed.  It turned out to be way more difficult than originally thought with the result that there is no good way using only probabilities to describe causality within standard statistics.  Formulas and multiple tests can be used to determine how strongly correlated two factors/variables are, but whether they have a causal relationship has always been mathematically hard to describe and consciously left out of statistics by design and desire by a strong faction that believe it has no place within statistics.

This leaves model builders and by extension, system engineers that use Model Based Systems Engineering faced with big challenges. Correlation and causality can both be predictors, but both have their own hazards: correlations can be coincidental, confounded, or weakly linked, making them extremely poor predictors.  Causal relationships can lack the intermediary linking variables needed to establish the predictor because the entire causal chain is not accessible, measurable, or testable.

©2018 Judea Pearl

©2018 Judea Pearl

Mathematician and Data Scientist, Dr. Judea Pearl has spent his life developing the concept of causality and developing new scientific methods for making causal inferences and modeling the causality that may or may not exist between the different state variables within a system.  His work in this area of mathematics and data science is ground-breaking.  His most recently published book:  The Book of Why—The New Science of Cause and Effect ©2018 is a readable work that is already described as foundational required reading for all Data Scientists.  I think it should be required reading for all System Engineers as well.

Judea Pearl describes two languages he uses to establish this new science: Causal Diagrams to express what is known, and a symbolic language to express what is wanted to be known. A “Causal Diagram” is a technique he created and uses to establish causal relationships   He shows how to use causal diagrams to gain deeper understanding of a system model and how to gain even deeper knowledge by climbing an information structure that is underpinned by causal inference.  In his book, he defines “The Ladder of Causation, and explains why tradition statistics can never adequately represent causation by using just sets of probabilities.  Traditional statistics is therefore trapped at the association level and limited to making simple correlations and regressions to produce models. Given the techniques of causal diagrams and symbolic languages, both data scientists and systems engineers can climb through the three levels of causation gaining deeper knowledge of a system at each level.

How many System Engineers are aware of his work? 

How well is causality taught and dealt with in engineering curriculums?  This recent new science and its application and adoption by practicing engineers is critical to model building and systems engineering, but it is not clear yet just how big an issue this is or should be with the SE community.

"Everyone has heard the claim, "Correlation does not imply causation." What might sound like a reasonable dictum metastasized in the twentieth century into one of science's biggest obstacles, as a legion of researchers became unwilling to make the claim that one thing could cause another. Even two decades ago, asking a statistician a question like "Was it the aspirin that stopped my headache?" would have been like asking if he believed in voodoo, or at best a topic for conversation at a cocktail party rather than a legitimate target of scientific inquiry. Scientists… [could] posit only that the probability that one thing was associated with another. This all changed with Judea Pearl, whose work on causality was not just a victory for common sense, but a revolution in the study of the world"..[Google Book Review Quote]

I know I have given this book a very broad-brush description.  There are more detailed reviews on the Internet.  My hope is to convince you that by reading this book you will gain a powerful engineering tool.  You will gain a framework of how to investigate and use causality as a system analysis tool, and learn how to determine when you can use the causal inference techniques in the book given the data you have or the data you need to get.

Very Respectfully,

Mark R. Evans, ASEP

President 2020, INCOSE Chesapeake Chapter