Niel Nickolaisen returns for another Guest Spotlight. Today, Niel turns his attention to the importance of defining cause or correlation in decision making, particularly around the ongoing situation concerning Covid-19.
If I were the benevolent dictator of the world of data, data analytics, machine learning and artificial intelligence, I would require (after all, I am a dictator) everyone to understand the difference between correlation and cause. The associated training would emphasize that cause can be much more challenging to establish than correlation but is necessary if we want to make valid decisions using data and artificial intelligence. The past months of Covid-19 decision-making has highlighted the importance of knowing the difference between correlation and cause.
For example, one study indicated that people with type O blood were less likely to contract Covid-19 than those with other blood types. Really? How did the researchers/analysts isolate all of the potential variables (and correlations) in the group they studied to narrow down the differences among those being studied to just blood type? I would imagine it would be pretty straightforward to eliminate the obvious things like age and gender but then things get a bit more difficult.
What about activity level and types of activities? What about social and economic conditions? Neighborhoods? Diet? Healthiness? Stress levels? Working conditions? Movement patterns? Do people with type O blood have more stairs in their houses and apartments? And, does any of that matter when it comes to Covid-19 resistance? On and on and on. I have type O blood and so would like to think that the research is valid and I am thus Covid-invulnerable but how can I or anyone know which are the principal determinants of Covid-19 resistance?
Cause/Effect Clarity Is Key to Decision Making
As another Covid-19 example, two recent studies seemed to be in direct conflict with one another. One study claimed that there was nearly zero chance of asymptomatic spread (i.e. if I have Covid-19 but do not yet have symptoms, I cannot spread it to anyone else). Another study claimed that over 50% of Covid-19 cases were the result of asymptomatic spread. Which is it and how do we make decisions based on these results? In my role as benevolent dictator I would ensure that anyone doing research got to cause/effect clarity by clearly defining the potential correlation variables and then eliminating them as part of the analysis.
While Covid-19 data, research and conclusions is a timely example of the challenges of not knowing the distinction between correlation and cause, we encounter these challenges on a daily basis in our own decision-making. My company delivers employee recognition solutions to the market. Our claim is that employee recognition has a strong positive influence on employee engagement. That feels true but to what extent is there a measurable cause and effect relationship between, for example, employee recognition and reduced employee turnover? In my perfect world we could identify such a cause/effect relationship and do it in a qualitative way. With that in place and knowing the cost to replace unretained employees we could provide our clients and prospective clients with the financial impact of our employee recognition programs. With an even more detailed cause/effect relationship in place, we could prescribe specific actions our clients could take to improve employee retention.
My caution is for us to be careful about settling for correlation when cause is our goal. In my experience, that means we take the time to gather more data than we think we need, suspend our biases, and be willing to validate analyses.
Senior VP & Chief Information Officer, O.C. Tanner
Be Careful About Correlation When Cause Is Your Goal
But, what data can we gather from our own processes, systems and transactions that establish cause? Which data do we need to gather from our clients that will help us establish cause? Will having that data move us from correlation to cause? How do we eliminate the variables that might lead us to the incorrect cause/effect conclusions? How do we measure what is going on outside of work in the lives of the employees? How do we measure the impact of an employee’s supervisor or co-workers? And, perhaps even more important, how much do such “immeasurables” influence our cause/effect conclusions? Or, in this case, do we simply generalize our analysis and be satisfied with the correlation that visibly recognizing the efforts of the members of our teams will improve retention and employee engagement?
This all seems a bit depressing as many of us pursue advanced analytics, machine learning and artificial intelligence projects – hopefully designed to gather the data, develop and validate the models and present us with cause relationships that we can use to improve decision-making, predict results and prescribe actions. I am not saying that such projects are quixotic – I have worked on projects that were able to do all of that. Rather my caution is for us to be careful about settling for correlation when cause is our goal. In my experience that means we take the time to gather more data than we think we need, suspend our biases about what might be the causal relationships and be willing to validate, through experimentation, what the analyses and models suggest. It is possible to get to cause (or close enough to cause to make massive improvements in our decision-making) but that often requires that our work get well past correlation.
About Our Guest Writer
Senior VP & Chief Information Officer, O.C. Tanner
Niel Nickolaisen is the CIO at OC Tanner. He has held technology executive and operations executive positions, typically in turnaround roles. He is the author of “The Agile Culture” (2014, Addison Wesley) and “Stand Back and Deliver” (Addison Wesley, 2009). Niel is the winner of several IT leadership awards including the ProphIT award, CIO Magazine 100 and Golden Bridge. In 2020, he was named one of the 20 CIO’s changing the future of technology.