-
Analyzing the correlation between price and demand in economics
-
Understanding student performance across subjects
-
Measuring marketing expenditure vs. sales
-
Identifying trends in medical and social sciences
Methods of Karl Pearson’s Coefficient of Correlation:
1. Actual Mean Method (Deviation from Actual Mean)
Formula:
∑(x – x̄)(y – ȳ)
r = ————————-
√[∑(x – x̄)² × ∑(y – ȳ)²]
Where:
r = Karl Pearson’s correlation coefficient
x̄ = Mean of variable X
ȳ = Mean of variable Y
x, y = Individual values of variables X and Y
Use When:
-
You have small datasets
-
You can calculate the actual mean for both variables
Example Use Case: Used in classroom or exam performance correlation where averages are easily calculated.
2. Assumed Mean Method
Formula:
∑dx·dy – (∑dx)(∑dy)/n
r = —————————————–
√[∑dx² – (∑dx)²/n] · [∑dy² – (∑dy)²/n]
Where:
r = Karl Pearson’s correlation coefficient
dx = x – A (Deviation of X from assumed mean A)
dy = y – B (Deviation of Y from assumed mean B)
n = Number of observations
Use When:
-
Data values are large or awkward to compute exact means
-
You want to simplify calculations
Example Use Case: Used when data like income, population, or marks are large, and approximate means make calculations easier.
3. Direct Method (Raw Score Method)
Formula:
n(∑xy) – (∑x)(∑y)
r = —————————————–
√[n(∑x²) – (∑x)²] · [n(∑y²) – (∑y)²]
Where:
r = Karl Pearson’s correlation coefficient
n = Number of data pairs
∑xy = Sum of the products of paired scores
∑x = Sum of X values
∑y = Sum of Y values
∑x² = Sum of squares of X
∑y² = Sum of squares of Y
Use When:
-
You have complete raw scores (not deviations)
-
Data is entered directly into software or spreadsheets
Example Use Case: Used in software-based or spreadsheet-based analysis like Excel, SPSS, or R, where summations can be automated.
Summary Table of Methods of Karl Pearson’s Coefficient
| Method | Formula Type | Best For | Advantage |
|---|---|---|---|
| Actual Mean Method | Deviation from mean | Small datasets | Accurate, uses true central tendency |
| Assumed Mean Method | Deviation from assumed mean | Large datasets with large values | Simplifies calculation with approximations |
| Direct Method | Raw score formula | When using software or tools | Fastest with computing tools |
Properties of Coefficient of Correlation:
1. Value Lies Between –1 and +1
The coefficient of correlation always ranges from –1 to +1.
- r = +1: Perfect positive linear correlation
- r = –1: Perfect negative linear correlation
- r = 0: No linear correlation
2. Unit-Free (Dimensionless)
The coefficient of correlation is a pure number without units. It remains the same regardless of the scale or units of measurement, such as kilograms, dollars, or centimeters.
3. Symmetrical Between Variables
The correlation between X and Y is identical to the correlation between Y and X.
r(X,Y) = r(Y,X)
4. Unaffected by Origin and Scale (Except Multiplication by Negative Number)
If the variables are transformed linearly (e.g., u = aX + b), the value of r remains unchanged, provided a > 0.
- Addition or subtraction (change in origin): no effect
- Multiplication by a positive constant (change in scale): no effect
- Multiplication by a negative constant: changes the sign of r
5. Indicates Direction of Relationship
- If r > 0: X and Y increase together (positive relationship)
- If r < 0: X increases as Y decreases (negative relationship)
- If r = 0: No linear relationship
6. Sensitive to Outliers
Pearson’s r is highly sensitive to extreme values. A single outlier can significantly distort the value of the correlation coefficient, making the result unreliable.
7. Only Measures Linear Relationship
The coefficient measures only linear association between variables.
If the relationship is non-linear, Pearson’s r may be close to 0 even if a strong association exists in another form (e.g., quadratic, exponential).
8. Does Not Imply Causation
Even a strong correlation does not mean one variable causes the other. Correlation simply shows that the variables move together, not why they do so.
Assumptions of Karl Pearson’s Coefficient of Correlation:
- Linearity
It assumes a linear relationship between the two variables. That means, the change in one variable results in a proportional change in the other. If the relationship is non-linear (e.g., curved), Pearson’s coefficient may give misleading results.
- Quantitative and Continuous Data
Both variables must be quantitative (numerical) and measured on an interval or ratio scale. Pearson’s method is not suitable for categorical or ordinal data.
- No Extreme Outliers
The data should be free from extreme outliers or influential values, as they can significantly distort the correlation coefficient and misrepresent the actual relationship.
- Normal Distribution (for inference)
While not required for calculating correlation, a bivariate normal distribution is assumed when performing hypothesis tests or significance testing based on Pearson’s r.
- Homoscedasticity
The variance of one variable should be relatively constant across levels of the other variable. In other words, the data points should form a roughly even “cloud” in a scatter plot rather than a funnel shape.
- Independence of Observations
Each data pair (xi,yi) should be independent of others. Repeated or related observations violate this assumption and can bias the result.
- Both Variables Should Be Random
Both variables should ideally be from random samples. If one or both are fixed or deterministic, the result may not reflect a general relationship.
Limitations of Karl Pearson’s coefficient of correlation
-
Assumes linear relationship only
-
Sensitive to extreme values (outliers)
-
Requires quantitative data
-
Can be misinterpreted without context or scatter plot