All posts
Statistics
Data Science
Machine Learning

Mean Can Lie — Discover the Real Insights with Mean and Standard Deviation

The mean alone can be misleading. Learn how standard deviation, normal distributions, the empirical rule, and z-scores reveal the true story hidden in your data.

July 6, 2024 · 3 min read · By Kshitiz Regmi

The mean, often referred to as the average, is a measure of central tendency that summarizes a set of values into a single number. It represents the central point of the dataset — but the central point alone rarely tells the whole story.

Mean

The mean is calculated by dividing the sum of all values by the number of values in the dataset:

μ=i=1nxin\mu = \frac{\sum_{i=1}^n x_i}{n}

import numpy as np

d1 = np.arange(1, 6)        # [1, 2, 3, 4, 5]
d2 = np.full((5,), fill_value=3)  # [3, 3, 3, 3, 3]

print(np.mean(d1))  # 3.0
print(np.mean(d2))  # 3.0

Both datasets share the same mean — yet they are completely different. d1 has variability; d2 has none. The mean does not reflect distribution. It gives no information about the spread or shape of data.

Spread of Data: Standard Deviation

Standard deviation measures how far data points are from the mean.

Population Standard Deviation

σ=1Ni=1N(xiμ)2\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2}

Sample Standard Deviation

s=1n1i=1n(xixˉ)2s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2}

Normal Distribution

A normal (Gaussian) distribution is a continuous probability distribution defined by two parameters: mean μ\mu and standard deviation σ\sigma. Its probability density function produces the classic bell curve.

import seaborn as sns
import numpy as np

np.random.seed(0)
x = np.random.normal(loc=5, scale=1, size=1500)  # loc=mean, scale=std
sns.histplot(x, kde=True)

Normal distribution histogram

  • Mean (μ) determines the center of the distribution.
  • Standard deviation (σ) determines the spread.

Same Mean, Different Standard Deviations

import matplotlib.pyplot as plt

np.random.seed(0)
d1 = np.random.normal(loc=5, scale=1, size=1500)
d2 = np.random.normal(loc=5, scale=4, size=1500)

sns.kdeplot(d1, fill=True, label='mean=5, std=1')
sns.kdeplot(d2, fill=True, label='mean=5, std=4')
plt.axvline(5, color='red')
plt.legend()
plt.show()

Same mean, different standard deviations

  • Low standard deviation → data points cluster tightly around the mean.
  • High standard deviation → data points spread over a much wider range.

This is precisely why the mean can lie. A dataset with a large standard deviation has high uncertainty even if its mean looks "normal."

The Empirical Rule (68-95-99.7 Rule)

For any normal distribution:

Range% of Data
μ ± 1σ~68%
μ ± 2σ~95%
μ ± 3σ~99.7%

This rule is invaluable for anomaly detection — a data point beyond 3σ from the mean is statistically rare (~0.3% chance).

Normal vs Standard Normal Distribution

PropertyNormalStandard Normal
Meanμ0
Varianceσ²1

The Z-score transforms any normal distribution into the standard normal distribution through standardization — the same technique used for feature scaling in machine learning:

z=xiμσz = \frac{x_i - \mu}{\sigma}

from scipy.stats import zscore
import matplotlib.pyplot as plt

np.random.seed(0)
normal_dist = np.random.normal(loc=5, scale=4, size=1500)
standard_normal_dist = zscore(normal_dist)

sns.kdeplot(normal_dist, fill=True, label='Normal Distribution')
sns.kdeplot(standard_normal_dist, fill=True, label='Standard Normal Distribution')
plt.legend()
plt.show()

Normal vs Standard Normal Distribution

Takeaways

  1. Always pair the mean with the standard deviation. A mean without variance is incomplete information.
  2. Z-scores reveal outliers. Any point with z>3|z| > 3 is a strong outlier candidate.
  3. Feature scaling in ML (StandardScaler, zscore) exploits this — it centers data at 0 with unit variance for stable gradient descent.
  4. The empirical rule is your quick sanity check for normally distributed data.

Understanding these concepts is foundational for statistical analysis, model evaluation, and trustworthy data storytelling.