Karl Pearson’s coefficient of correlation, Concept, Uses, Methods, Properties, Assumptions and Limitations

by intactoneDecember 16, 2022June 30, 2025

Karl Pearson’s Coefficient of Correlation is a statistical measure that evaluates the strength and direction of the linear relationship between two continuous variables. It is denoted by ‘r’ and ranges between –1 and +1. A value of +1 indicates a perfect positive linear correlation, meaning both variables increase together; –1 denotes a perfect negative linear correlation, where one variable increases while the other decreases. A value of 0 implies no linear relationship.

Developed by British statistician Karl Pearson, this method is one of the most widely used techniques in correlation analysis. The coefficient is calculated using either raw scores or deviations from the mean, and it considers all paired values in the dataset. It is particularly useful in fields like economics, business, psychology, and natural sciences for forecasting, hypothesis testing, and decision-making.

However, it assumes a linear relationship and is highly sensitive to outliers, which can distort results. Also, while it shows association, it does not imply causation. Despite these limitations, it remains a powerful and foundational tool for understanding relationships between variables in statistical analysis.

Uses of Karl Pearson’s Coefficient:

Analyzing the correlation between price and demand in economics
Understanding student performance across subjects
Measuring marketing expenditure vs. sales
Identifying trends in medical and social sciences

Methods of Karl Pearson’s Coefficient of Correlation:

1. Actual Mean Method (Deviation from Actual Mean)

Formula:

∑(x – x̄)(y – ȳ)
r = ————————-
√[∑(x – x̄)² × ∑(y – ȳ)²]

Where:

r = Karl Pearson’s correlation coefficient

x̄ = Mean of variable X

ȳ = Mean of variable Y

x, y = Individual values of variables X and Y

Use When:

You have small datasets
You can calculate the actual mean for both variables

Example Use Case: Used in classroom or exam performance correlation where averages are easily calculated.

2. Assumed Mean Method

Formula:

∑dx·dy – (∑dx)(∑dy)/n
r = —————————————–
√[∑dx² – (∑dx)²/n] · [∑dy² – (∑dy)²/n]

Where:

r = Karl Pearson’s correlation coefficient

dx = x – A (Deviation of X from assumed mean A)

dy = y – B (Deviation of Y from assumed mean B)

n = Number of observations

Use When:

Data values are large or awkward to compute exact means
You want to simplify calculations

Example Use Case: Used when data like income, population, or marks are large, and approximate means make calculations easier.

3. Direct Method (Raw Score Method)

Formula:

n(∑xy) – (∑x)(∑y)
r = —————————————–
√[n(∑x²) – (∑x)²] · [n(∑y²) – (∑y)²]

Where:

r = Karl Pearson’s correlation coefficient

n = Number of data pairs

∑xy = Sum of the products of paired scores

∑x = Sum of X values

∑y = Sum of Y values

∑x² = Sum of squares of X

∑y² = Sum of squares of Y

Use When:

You have complete raw scores (not deviations)
Data is entered directly into software or spreadsheets

Example Use Case: Used in software-based or spreadsheet-based analysis like Excel, SPSS, or R, where summations can be automated.

Summary Table of Methods of Karl Pearson’s Coefficient

Method	Formula Type	Best For	Advantage
Actual Mean Method	Deviation from mean	Small datasets	Accurate, uses true central tendency
Assumed Mean Method	Deviation from assumed mean	Large datasets with large values	Simplifies calculation with approximations
Direct Method	Raw score formula	When using software or tools	Fastest with computing tools

Properties of Coefficient of Correlation:

1. Value Lies Between –1 and +1

The coefficient of correlation always ranges from –1 to +1.

r = +1: Perfect positive linear correlation
r = –1: Perfect negative linear correlation
r = 0: No linear correlation

2. Unit-Free (Dimensionless)

The coefficient of correlation is a pure number without units. It remains the same regardless of the scale or units of measurement, such as kilograms, dollars, or centimeters.

3. Symmetrical Between Variables

The correlation between X and Y is identical to the correlation between Y and X.

4. Unaffected by Origin and Scale (Except Multiplication by Negative Number)

If the variables are transformed linearly (e.g., , the value of r remains unchanged, provided a > 0.

Addition or subtraction (change in origin): no effect
Multiplication by a positive constant (change in scale): no effect
Multiplication by a negative constant: changes the sign of r

5. Indicates Direction of Relationship

If r > 0: X and Y increase together (positive relationship)
If r < 0: X increases as Y decreases (negative relationship)
If r = 0: No linear relationship

6. Sensitive to Outliers

Pearson’s r is highly sensitive to extreme values. A single outlier can significantly distort the value of the correlation coefficient, making the result unreliable.

7. Only Measures Linear Relationship

The coefficient measures only linear association between variables.
If the relationship is non-linear, Pearson’s r may be close to 0 even if a strong association exists in another form (e.g., quadratic, exponential).

8. Does Not Imply Causation

Even a strong correlation does not mean one variable causes the other. Correlation simply shows that the variables move together, not why they do so.

Assumptions of Karl Pearson’s Coefficient of Correlation:

Linearity

It assumes a linear relationship between the two variables. That means, the change in one variable results in a proportional change in the other. If the relationship is non-linear (e.g., curved), Pearson’s coefficient may give misleading results.

Quantitative and Continuous Data

Both variables must be quantitative (numerical) and measured on an interval or ratio scale. Pearson’s method is not suitable for categorical or ordinal data.

No Extreme Outliers

The data should be free from extreme outliers or influential values, as they can significantly distort the correlation coefficient and misrepresent the actual relationship.

Normal Distribution (for inference)

While not required for calculating correlation, a bivariate normal distribution is assumed when performing hypothesis tests or significance testing based on Pearson’s r.

Homoscedasticity

The variance of one variable should be relatively constant across levels of the other variable. In other words, the data points should form a roughly even “cloud” in a scatter plot rather than a funnel shape.

Independence of Observations

Each data pair should be independent of others. Repeated or related observations violate this assumption and can bias the result.

Both Variables Should Be Random

Both variables should ideally be from random samples. If one or both are fixed or deterministic, the result may not reflect a general relationship.

Limitations of Karl Pearson’s coefficient of correlation

Assumes linear relationship only
Sensitive to extreme values (outliers)
Requires quantitative data
Can be misinterpreted without context or scatter plot

intactone

View all posts by intactone →

Karl Pearson’s coefficient of correlation, Concept, Uses, Methods, Properties, Assumptions and Limitations

2. Unit-Free (Dimensionless)

6. Sensitive to Outliers

Like this:

Related

intactone

Leave a ReplyCancel reply

Indian Contract Act 1872

Statistics, Introductions, Definition, Characteristics, Nature, Scope, Importance, Limitations and Distrust

Negotiable instruments Act 1881

Accounting, Definition, Nature, Scope, Characteristics, Objectives, Functions, Advantages, Limitations

Business Letter Writing, Meaning, Objectives, Basic Principles, Purpose, Format, Functions and Types

Provisions, Concepts, Meaning, Definitions, Objectives, Types, Journal Entries and Difference between Provisions, Adjustments and Reserves

Adjustments, Concepts, Meaning, Definitions, Objectives, Types and Journal Entries

Preparation of Balance Sheet as per Schedule III

Final Accounts of Companies

Accounting Entries for Issue and Redemption

2. Unit-Free (Dimensionless)

6. Sensitive to Outliers

Share this:

Like this:

Related

intactone

You might also like

Leave a ReplyCancel reply