Mean Can Lie — Discover the Real Insights with Mean and Standard Deviation
The mean alone can be misleading. Learn how standard deviation, normal distributions, the empirical rule, and z-scores reveal the true story hidden in your data.
July 6, 2024 · 3 min read · By Kshitiz Regmi
The mean, often referred to as the average, is a measure of central tendency that summarizes a set of values into a single number. It represents the central point of the dataset — but the central point alone rarely tells the whole story.
Mean
The mean is calculated by dividing the sum of all values by the number of values in the dataset:
import numpy as np
d1 = np.arange(1, 6) # [1, 2, 3, 4, 5]
d2 = np.full((5,), fill_value=3) # [3, 3, 3, 3, 3]
print(np.mean(d1)) # 3.0
print(np.mean(d2)) # 3.0
Both datasets share the same mean — yet they are completely different. d1 has variability; d2 has none. The mean does not reflect distribution. It gives no information about the spread or shape of data.
Spread of Data: Standard Deviation
Standard deviation measures how far data points are from the mean.
Population Standard Deviation
Sample Standard Deviation
Normal Distribution
A normal (Gaussian) distribution is a continuous probability distribution defined by two parameters: mean and standard deviation . Its probability density function produces the classic bell curve.
import seaborn as sns
import numpy as np
np.random.seed(0)
x = np.random.normal(loc=5, scale=1, size=1500) # loc=mean, scale=std
sns.histplot(x, kde=True)

- Mean (μ) determines the center of the distribution.
- Standard deviation (σ) determines the spread.
Same Mean, Different Standard Deviations
import matplotlib.pyplot as plt
np.random.seed(0)
d1 = np.random.normal(loc=5, scale=1, size=1500)
d2 = np.random.normal(loc=5, scale=4, size=1500)
sns.kdeplot(d1, fill=True, label='mean=5, std=1')
sns.kdeplot(d2, fill=True, label='mean=5, std=4')
plt.axvline(5, color='red')
plt.legend()
plt.show()

- Low standard deviation → data points cluster tightly around the mean.
- High standard deviation → data points spread over a much wider range.
This is precisely why the mean can lie. A dataset with a large standard deviation has high uncertainty even if its mean looks "normal."
The Empirical Rule (68-95-99.7 Rule)
For any normal distribution:
| Range | % of Data |
|---|---|
| μ ± 1σ | ~68% |
| μ ± 2σ | ~95% |
| μ ± 3σ | ~99.7% |
This rule is invaluable for anomaly detection — a data point beyond 3σ from the mean is statistically rare (~0.3% chance).
Normal vs Standard Normal Distribution
| Property | Normal | Standard Normal |
|---|---|---|
| Mean | μ | 0 |
| Variance | σ² | 1 |
The Z-score transforms any normal distribution into the standard normal distribution through standardization — the same technique used for feature scaling in machine learning:
from scipy.stats import zscore
import matplotlib.pyplot as plt
np.random.seed(0)
normal_dist = np.random.normal(loc=5, scale=4, size=1500)
standard_normal_dist = zscore(normal_dist)
sns.kdeplot(normal_dist, fill=True, label='Normal Distribution')
sns.kdeplot(standard_normal_dist, fill=True, label='Standard Normal Distribution')
plt.legend()
plt.show()

Takeaways
- Always pair the mean with the standard deviation. A mean without variance is incomplete information.
- Z-scores reveal outliers. Any point with is a strong outlier candidate.
- Feature scaling in ML (StandardScaler, zscore) exploits this — it centers data at 0 with unit variance for stable gradient descent.
- The empirical rule is your quick sanity check for normally distributed data.
Understanding these concepts is foundational for statistical analysis, model evaluation, and trustworthy data storytelling.