The Best Statistical guide for Correlation vs Causation| 2021

Correlation vs Causation

Before we jump right into looking at the difference between Correlation and Causation, It is important to note that in every experiment, there are the independent and the dependent variables. An independent variable is a piece of data that can be varied whereas a dependent variable is piece of data that is influenced by an outside factor, often the independent variable.

Correlation is a statistical technique which tells us how strongly the pair of variables are linearly related and change together. It does not tell us why and how behind the relationship but it just says the relationship exists.

Example: Correlation between Ice cream sales and persons suffering from sunburns.

As the sales of ice creams is increasing so do the sales of sunglasses.

What are the different types of correlations?

• Positive correlation: Variables A and B move in the same direction. For example, as Variable A increases, so does B.
• Negative correlation: Variables A and B move in opposite directions. For example, as Variable A increases, B decreases.
• No correlation: There is no apparent link between Variables A and B.

Causation takes a step further than correlation. It says any change in the value of one variable will cause a change in the value of another variable, which means one variable makes other to happen. It is also referred as cause and effect.

Example: When a person is exercising then the amount of calories burning goes up every minute. Former is causing latter to happen.

So now we know what correlation and causation is, it’s time to understand “Correlation does not imply causation!” with a famous example.

Ice cream sales is correlated with homicides in New York (Study)

As the sales of ice cream rise and fall, so do the number of homicides. Does the consumption of ice cream causing the death of the people?

No. Two things are correlated doesn’t mean one causes other.

Correlation does not mean causality or in our example, ice cream is not causing the death of people.

When 2 unrelated things tied together, so these can be either bound by causality or correlation.

In Majority of the cases correlation, are just because of the coincidences. Just because it seems like one factor is influencing the other, it doesn’t mean that it’s actually does.

Correlation is something which we think, when we can’t see under the covers. So the less the information we have the more we are forced to observe correlations. Similarly the more information we have the more transparent things will become and the more we will be able to see the actual casual relationships.

Correlation vs Causation: How to Tell if Something’s a Coincidence or a Causality

So how do you test your data so you can make bulletproof claims about causation? There are five ways to go about this – technically they are called design of experiments. We list them from the most robust method to the weakest:

1. Randomized and Experimental Study

Say you want to test the new shopping cart in your ecommerce app. Your hypothesis is that there are too many steps before a user can actually check out and pay for their item, and that this difficulty is the friction point that blocks them from buying more often. So you’ve rebuilt the shopping cart in your app and want to see if this will increase the chances of users buying stuff.

The best way to prove causation is to set up a randomized experiment. This is where you randomly assign people to test the experimental group.

In experimental design, there is a control group and an experimental group, both with identical conditions but with one independent variable being tested. By assigning people randomly to test the experimental group, you avoid experimental bias, where certain outcomes are favored over others.

In our example, you would randomly assign users to test the new shopping cart you’ve prototyped in your app, while the control group would be assigned to use the current (old) shopping cart.

After the testing period, look at the data and see if the new cart leads to more purchases. If it does, you can claim a true causal relationship: your old cart was hindering users from making a purchase. The results will have the most validity to both internal stakeholders and other people outside your organization whom you choose to share it with, precisely because of the randomization.

2. Quasi-Experimental Study

But what happens when you can’t randomize the process of selecting users to take the study? This is a quasi-experimental design. There are six types of quasi-experimental designs, each with various applications. 2

The problem with this method is, without randomization, statistical tests become meaningless. You cannot be totally sure the results are due to the variable or to nuisance variables brought about by the absence of randomization.

Quasi-experimental studies will typically require more advanced statistical procedures to get the necessary insight. Researchers may use surveys, interviews, and observational notes as well – all complicating the data analysis process.

Let’s say you’re testing whether the user experience in your latest app version is less confusing than the old UX. And you’re specifically using your closed group of app beta testers. The beta test group wasn’t randomly selected since they all raised their hand to gain access to the latest features. So, proving correlation vs causation – or in this example, UX causing confusion – isn’t as straightforward as when using a random experimental study.

While scientists may shun the results from these studies as unreliable, the data you gather may still give you useful insight (think trends).

3. Correlational Study

A correlational study is when you try to determine whether two variables are correlated or not. If A increases and B correspondingly increases, that is a correlation. Just remember that correlation doesn’t imply causation and you’ll be alright.

For example, you decide you want to test whether a smoother UX has a strong positive correlation with better app store ratings. And after observation, you see that when one increases, the other does too. You’re not saying A (smooth UX) causes B (better ratings), you’re saying A is strongly associated with B. And perhaps might even predict it. That’s a correlation.

4. Single-Subject Study

Single-subject design is more often used in psychology and education, as it is concerned with an individual subject. Instead of a control and experimental group, the subject serves as his or her own control. The researcher is concerned about attempting to change the individual’s behavior or thinking.

In mobile marketing, a single-subject study might take the form of asking one specific user to test the usability of a new app feature. You can have them do one action several times on the current app, then have them try the same action on the new app version. Collect the data and see if the action is done faster on the old or new app.

Obviously, this design is using data from one user. His or her experience cannot be generalized to all your users no matter how perfect a fit to your ideal customer persona. That’s one reason why this type of study is rarely used in marketing.

5. Stories

Anecdotes, sadly, are sometimes all the proof we have to establish causation. You might come across:

• Support staff: “Customers think the new user interface is tough to use. That’s why they’re uninstalling.”
• Customer X on Twitter: “We tried to buy a product on your app and it’s making my phone crash!”

The problem here is: while they could have a valid pain point and might make it in a convincing (and highly emotional) manner, these stories do not prove without a doubt that A causes B. They’re really just stories at this point, and carry less weight than the other options above.