CHAPTER 6- Correlation
Introduction
Correlation is a statistical technique used to measure and describe the relationship between two variables.
For example, as the summer heat rises, more people visit hill stations, and ice cream sales increase. These are correlated events.
Correlation analysis answers questions like:
Is there any relationship between two variables?
Do both variables change together?
What is the strength of their relationship?
Types of Relationship
Correlation helps us understand if and how two variables move together.
2.1 Positive Correlation
A positive correlation occurs when two variables move in the same direction. For example:
As income rises, consumption also rises.
As temperature increases, ice cream sales increase.
2.2 Negative Correlation
A negative correlation occurs when two variables move in opposite directions. For example:
As the price of apples falls, demand increases.
As you spend more time studying, the chances of failing decrease.
2.3 No Correlation
In some cases, there is no relationship between two variables. For example:
The size of shoes and the money in your pocket have no relationship.
Measuring Correlation
Correlation is measured using various techniques. The three most important tools are:
Scatter Diagrams
Karl Pearson’s Coefficient of Correlation
Spearman’s Rank Correlation
Scatter Diagrams
A scatter diagram is a simple graphical method for examining the relationship between two variables.
Points representing the values of two variables are plotted on a graph, and the pattern of the scatter tells us about the relationship.
4.1 Perfect Positive Correlation
If all points lie on a straight upward-sloping line, it shows a perfect positive correlation. This means the variables move together exactly.
4.2 Perfect Negative Correlation
If the points lie on a straight downward-sloping line, this shows a perfect negative correlation.
4.3 No Correlation
If the points are scattered randomly with no clear pattern, there is no correlation.
Karl Pearson’s Coefficient of Correlation
Karl Pearson’s Coefficient is the most commonly used method to calculate the exact numerical relationship between two variables.
It measures how strong or weak the linear relationship is between two variables.
5.1 Formula
The formula for Karl Pearson’s Coefficient is:
r=∑(X−Xˉ)(Y−Yˉ)∑(X−Xˉ)2⋅∑(Y−Yˉ)2r = \frac{{\sum (X - \bar{X})(Y - \bar{Y})}}{{\sqrt{\sum (X - \bar{X})^2} \cdot \sqrt{\sum (Y - \bar{Y})^2}}}
r=∑(X−Xˉ)2⋅∑(Y−Yˉ)2∑(X−Xˉ)(Y−Yˉ)Where:rr
r = Correlation coefficientXX
X and
YY
Y = The two variablesXˉ\bar{X}
Xˉ and
Yˉ\bar{Y}
Yˉ = Mean values of
XX
X and
YY
Y
5.2 Properties
r has no units, making it easy to compare relationships between different variables.
r ranges from -1 to +1:
r = +1 means a perfect positive relationship.
r = -1 means a perfect negative relationship.
r = 0 means no relationship.
Spearman’s Rank Correlation
Spearman’s Rank Correlation measures the relationship between variables when they are ranked (ordered) instead of using actual values.
This method is useful when dealing with qualitative attributes like honesty or beauty that cannot be measured numerically.
6.1 Formula
The formula for Spearman’s Rank Correlation is:
rs=1−6∑D2n(n2−1)r_s = 1 - \frac{{6 \sum D^2}}{{n(n^2 - 1)}}
rs=1−n(n2−1)6∑D2Where:DD
D = Difference in ranks between two variablesnn
n = Number of observations
6.2 Use Cases
Spearman’s Rank Correlation is used when:
The data is ranked instead of measured.
There is a non-linear relationship between variables.
There are extreme values in the data.
Causation vs Correlation
It's important to remember that correlation does not imply causation.
Just because two variables are correlated doesn't mean one causes the other.
For example, the correlation between ice cream sales and drowning deaths is high in summer, but one doesn’t cause the other. Instead, high temperatures lead to both more ice cream sales and more people swimming.
Conclusion
Correlation helps understand the relationship between two variables and can be positive, negative, or absent.
Karl Pearson’s coefficient and Spearman’s rank correlation are two major techniques for measuring correlation.
However, correlation only shows relationships, not cause-and-effect.