Not to be confused either with the Chebychev's inequalities on the size of the number-theoretic function
.
In probability theory, Chebyshev's inequality (also spelled as Tchebysheff's inequality, Нера́венство Чебышева) guarantees that in any probability distribution,"nearly all" values are close to the mean — the precise statement being that no more than 1/k2 of the distribution's values can be more than k standard deviations away from the mean (or equivalently, at least 1 - 1/k2 of the distribution's values are within k standard deviations of the mean). The inequality has great utility because it can be applied to completely arbitrary distributions (unknown except for mean and variance), for example it can be used to prove the weak law of large numbers.
In practical usage, in contrast to the empirical rule, which applies to normal distributions, under Chebyshev's Inequality just 75% of values lie within two standard deviations of the mean and 89% of values within three standard deviations.[1][2]
The term Chebyshev's inequality may also refer to the Markov's inequality, especially in the context of analysis.
History [
The theorem is named after Russian mathematician Pafnuty Chebyshev, although it was first formulated by his friend and colleague Irénée-Jules Bienaymé.[3]:98 The theorem was first stated without proof by Bienaymé in 1853[4] and later proved by Chebyshev in 1867.[5] His student Andrey Markov provided another proof in his 1884 PhD thesis.[6]
Statement [
Chebyshev's inequality is usually stated for random variables, but can be generalized to a statement about measure spaces.
Probabilistic statement [
Let X be a random variable with finite expected value μ and finite non-zero variance σ2. Then for any real number k > 0,

Only the case k > 1 provides useful information. When k < 1 the right-hand side is greater than one, so the inequality becomes vacuous, as the probability of any event cannot be greater than one. When k = 1 it just says the probability is less than or equal to one, which is always true.
As an example, using k = √2 shows that at least half of the values lie in the interval (μ − √2σ, μ + √2σ).
Because it can be applied to completely arbitrary distributions (unknown except for mean and variance), the inequality generally gives a poor bound compared to what might be possible if something is known about the distribution involved.
| k |
Min % within k standard deviations of mean |
Max % beyond k standard deviations from mean |
| 1 |
0% |
100% |
| √2 |
50% |
50% |
| 2 |
75% |
25% |
| 3 |
88.8889% |
11.1111% |
| 4 |
93.75% |
6.25% |
| 5 |
96% |
4% |
| 6 |
97.2222% |
2.7778% |
| 7 |
97.9592% |
2.0408% |
| 8 |
98.4375% |
1.5625% |
| 9 |
98.7654% |
1.2346% |
| 10 |
99% |
1% |
Measure-theoretic statement [
Let (X, Σ, μ) be a measure space, and let f be an extended real-valued measurable function defined on X. Then for any real number t > 0,[citation needed]

More generally, if g is an extended real-valued measurable function, nonnegative and nondecreasing on the range of f, then[citation needed]

The previous statement then follows by defining
as
if
and
otherwise, and taking
instead of
.
Example [
Suppose we randomly select a journal article from a source with an average of 1000 words per article, with a standard deviation of 200 words. We can then infer that the probability that it has between 600 and 1400 words (i.e. within k = 2 SDs of the mean) must be more than 75%, because there is less than 1⁄k2
= 1/4 chance to be outside that range, by Chebyshev's inequality. But if we additionally know that the distribution is normal, we can say that is a 75% chance the word count is between 770 and 1230 (which is an even tighter bound).
- Note
This example should be treated with caution as the inequality is only stated for probability distributions rather than for finite sample sizes. The inequality has since been extended to apply to finite sample sizes (see below).
Sharpness of bounds [
As shown in the example above, the theorem will typically provide rather loose bounds. However, the bounds provided by Chebyshev's inequality cannot, in general (remaining sound for variables of arbitrary distribution), be improved upon. For example, for any k ≥ 1, the following example meets the bounds exactly.

For this distribution, mean μ = 0 and standard deviation σ = 1/k, so

Equality holds only for distributions that are a linear transformation of this one.
Proof (of the two-sided version) [
Probabilistic proof [
Markov's inequality states that for any non-negative random variable Y and any positive number a, we have Pr(|Y| > a) ≤ E(|Y|)/a. One way to prove Chebyshev's inequality is to apply Markov's inequality to the random variable Y = (X − μ)2 with a = (σk)2.
It can also be proved directly. For any event A, let IA be the indicator random variable of A, i.e. IA equals 1 if A occurs and 0 otherwise. Then
![\begin{align}
& {} \qquad \Pr(|X-\mu| \geq k\sigma) = \operatorname{E}(I_{|X-\mu| \geq k\sigma})
= \operatorname{E}(I_{[(X-\mu)/(k\sigma)]^2 \geq 1}) \\[6pt]
& \leq \operatorname{E}\left(\left({X-\mu \over k\sigma} \right)^2 \right)
= {1 \over k^2} {\operatorname{E}((X-\mu)^2) \over \sigma^2} = {1 \over k^2}.
\end{align}](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2F5%2Fc%2Fc%2F5cc4139f8345509e3fcd93c3847859fc.png&site=wikipedia&host=http://en.wikipedia.org/)
The direct proof shows why the bounds are quite loose in typical cases: the number 1 to the left of "≥" is replaced by [(X − μ)/(kσ)]2 to the right of "≥" whenever the latter exceeds 1. In some cases it exceeds 1 by a very wide margin.
Measure-theoretic proof [
Fix
and let
be defined as
, and let
be the indicator function of the set
. Then, it is easy to check that, for any
,

since g is nondecreasing on the range of f, and therefore,

The desired inequality follows from dividing the above inequality by g(t).
Extensions [
Several extensions of Chebyshev's inequality have been developed.
Asymmetric two-sided case [
An asymmetric two-sided version of this inequality is also known.[7]
When the distribution is asymmetric or is unknown
![P( k_1 < X < k_2 ) \ge \frac{ 4 [ ( \mu - k_1 )( k_2 - \mu ) - \sigma^2 ] }{ ( k_2 - k_1 )^2 } ,](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2Fa%2F2%2F5%2Fa25b19050484f6078215efa7189b2e77.png&site=wikipedia&host=http://en.wikipedia.org/)
where σ2 is the variance and μ is the mean.
Bivariate case [
A version for the bivariate case is known.[8]
Let X1 and X2 be two random variables with means and finite variances of μ1 and μ2 and σ1 and σ2 respectively. Then

where for i = 1,2,
![T_i = \frac{ 4 \sigma_i^2 + [ 2 \mu_i - ( k_{ i1 } + k_{ i2 } ) ]^2 } { ( k_{ i2 } - k_{ i1 } ) }](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2Ff%2F6%2Fd%2Ff6d356d4edf1539e5b1dbd3c2a7cd80d.png&site=wikipedia&host=http://en.wikipedia.org/)
Two correlated variables [
Berge derived an inequality for two correlated variables X1 and X2.[9] Let ρ be the correlation coefficient between X1 and X2 and let σi2 be the variance of Xi. Then
![P\left( \bigcap_{ i = 1}^2 \left[ \frac{ | X_i - \mu_i | } { \sigma_i } < k \right] \right) \ge 1 - \frac{ 1 + \sqrt{ 1 - \rho^2 } } { k^2 }](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2F9%2F2%2Fe%2F92e15a6003427ac0c88e7db126549868.png&site=wikipedia&host=http://en.wikipedia.org/)
Lal later obtained an alternative bound[10]
![P\left( \bigcap_{ i = 1}^2 \left[ \frac{ | X_i - \mu_i | }{ \sigma_i } \le k_i \right] \right) \ge 1 - \frac{ k_1^2 + k_2^2 + \sqrt{ ( k_1^2 + k_2^2 )^2 - 4 k_1^2 k_2^2 \rho } } { 2 ( k_1 k_2 )^2 }](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2F2%2F4%2Fa%2F24a6500712ad64ebc0fec6b0e8940328.png&site=wikipedia&host=http://en.wikipedia.org/)
Isii derived a further generalisation.[11] Let

with 0 < k1 ≤ k2.
There are now three cases.
Case A: If
and
where

then

Case B: If the conditions in case A are not met but k1k2 ≥ 1 and

then

Case C: If the conditions in cases A or B are not met there is no universal bound other than 1.
Multivariate case [
The general case is known as the Birnbaum–Raymond–Zuckerman inequality after the authors who proved it for two dimensions.[12]
![P\left[ \sum_{ i = 1 }^n \frac{ ( X_i - \mu_i )^2 }{ \sigma_i^2 t_i^2 } \ge k^2 \right] \le \frac{ 1 }{ k^2 } \sum_{ i = 1 }^n \frac{ 1 }{ t_i^2 }](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2Fd%2F4%2Fa%2Fd4a28317f21a485b65b1e8b58d0f70d3.png&site=wikipedia&host=http://en.wikipedia.org/)
where Xi is the ith random variable, μi is the ith mean and σi2 is the ith variance.
If the variables are independent this inequality can be sharpened.[13]
![P\left[ \bigcap_{i = 1 }^n \frac{ | X_i - \mu_i | }{ \sigma_i } \le k_i \right] \ge \prod{ ( 1 - \frac{ 1 }{ k_i^2 } ) }](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2F9%2F5%2F9%2F959f9ae9a2cdab4cf90048b6c068e007.png&site=wikipedia&host=http://en.wikipedia.org/)
Olkin and Pratt derived an inequality for n correlated variables.[14]
![P\left( \bigcap_{i = 1 }^n \frac{ | X_i - \mu_i | }{ \sigma_i } < k_i \right) \ge 1 - \frac{ [ \sqrt{ u } + \sqrt{ n - 1 } \sqrt{ n \sum{ \frac{ 1 }{ k_i^2 } - u } } ]^2 }{ n^2 }](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2F1%2Fc%2F0%2F1c0f3e4e367df7c5010934e4a3d1a294.png&site=wikipedia&host=http://en.wikipedia.org/)
where the sum is taken over the n variables and

where ρij is the correlation between Xi and Xj
Olkin and Pratt's inequality was subsequently generalised by Godwin.[15]
Vector version [
Ferentinos[8] has shown that for a vector X = (x1, x2, x3, ...) with mean μ = (μ1, μ2, μ3, ...), variance σ2 = (σ12, σ22, σ32, ...) and an arbitrary norm (|| ||) that

An second related inequality has also been derived.[16] Let N be the dimension of the stochastic vector X and let E[X] be the mean of X. Let S be the covariance matrix and k > 0. Then
![P( ( X - E[ X ] )^T S^{ -1 } ( X - E[ X ] ) < k ) \ge 1 - \frac{ N }{ k }](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2F3%2F9%2F9%2F39975360ccab561d916023eb253e830f.png&site=wikipedia&host=http://en.wikipedia.org/)
where YT is the transpose of Y.
Higher moments [
An extension to higher moments is also possible

where k > 0 and n ≥ 2.
Exponential version [
A related inequality sometimes known as the exponential Chebyshev's inequality[17] is the inequality

where t > 0.
Let K( x, t ) be the cumulant generating function,

Taking the Legendre–Fenchel transformation[clarification needed] of K(x, t) and using the exponential Chebyshev's inequality we have

This inequality may be used to obtain exponential inequalities for unbounded variables.[18]
Inequalities for bounded variables [
If P(x) has finite support based on the interval [a, b], let M = max( |a|, |b| ) where |x| is the absolute value of x. If the mean of P(x) is zero then for all k > 0[19]

The second of these inequalities with r = 2 is the Chebyshev bound. The first provides a lower bound for the value of P(x).
Sharp bounds for a bounded variate have been derived by Niemitalo[20]
Let 0 ≤ X ≤ M where M > 0. Then
- Case 1

- Case 2

![\text{ if } [ E( X )> k \text{ and } E( X^2 ) \ge kE( X ) + ME( X ) - kM ] \text{ or } [ E( X ) \le k \text{ and } E( X^2 ) \ge kE( X ) ]](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2F5%2F4%2Fe%2F54e9ba11aebb89e7e42079580864458f.png&site=wikipedia&host=http://en.wikipedia.org/)
- Case 3

Finite samples [
Saw et al extended Chebyshev's inequality to cases where the population mean and variance are not known but are instead replaced by their sample estimates.[21]

where N is the sample size, m is the sample mean, k is a constant and s is the sample standard deviation. g(x) is defined as follows:
Let x ≥ 1, Q = N + 1, and R be the greatest integer less than Q / x. Let

Now



This inequality holds when the population moments do not exist and when the sample is weakly exchangeably distributed.
Kabán gives a somewhat less complex version of this inequality.[22]
![P( | X - m | \ge ks ) \le \frac{ 1 }{ [ N( N + 1 ) ]^{ 1 / 2 } }\left[ \left( \frac{ N - 1 }{ k^2 } + 1 \right) \right]](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2F0%2Fb%2F3%2F0b3d5cb61a8fef364bc7e70f6b9aa79d.png&site=wikipedia&host=http://en.wikipedia.org/)
If the standard deviation is a multiple of the mean then a further inequality can be derived,[22]

A table of values for the Saw–Yang–Mo inequality for finite sample sizes (n < 100) has been determined by Konijn.[23]
For fixed N and large m the Saw–Yang–Mo inequality is approximately[24]

Beasley et al have suggested a modification of this inequality[24]

In empirical testing this modification is conservative but appears to have low statistical power. Its theoretical basis currently remains unexplored.
Dependence of sample size [
The bounds these inequalities give on a finite sample are less tight than those the Chebyshev inequality gives for a distribution. To illustrate this let the sample size n = 100 and let k = 3. Chebyshev's inequality states that approximately 11.11% of the distribution will lie outside these limits. Kabán's version of the inequality for a finite sample states that approximately 12.05% of the sample lies outside these limits. The dependence of the confidence intervals on sample size is further illustrated below.
For N = 10, the 95% confidence interval is approximately ±13.5789 standard deviations.
For N = 100 the 95% confidence interval is approximately ±4.9595 standard deviations; the 99% confidence interval is approximately ±140.0 standard deviations.
For N = 500 the 95% confidence interval is approximately ±4.5574 standard deviations; the 99% confidence interval is approximately ±11.1620 standard deviations.
For N = 1000 the 95% and 99% confidence intervals are approximately ±4.5141 and approximately ±10.5330 standard deviations respectively.
The Chebyshev inequality for the distribution gives 95% and 99% confidence intervals of approximately ±4.472 standard deviations and ±10 standard deviations respectively.
Comparative bounds [
Although Chebyshev's inequality is the best possible bound for an arbitrary distribution, this is not necessarily true for finite samples. Samuelson's inequality states that all values of a sample will lie within √(N − 1) standard deviations of the mean. Chebyshev's bound improves as the sample size increases.
When N = 10, Samuelson's inequality states that all members of the sample lie within 3 standard deviations of the mean: in contrast Chebyshev's states that 95% of the sample lies within 13.5789 standard deviations of the mean.
When N = 100, Samuelson's inequality states that all members of the sample lie within approximately 9.9499 standard deviations of the mean: Chebyshev's states that 99% of the sample lies within 140.0 standard deviations of the mean.
When N = 500, Samuelson's inequality states that all members of the sample lie within approximately 22.3383 standard deviations of the mean: Chebyshev's states that 99% of the sample lies within 11.1620 standard deviations of the mean.
It is likely that better bounds for finite samples than these exist.
Sharpened bounds [
Chebyshev's inequality is important because of its applicability to any distribution. As a result of its generality it may not (and usually does not) provide as sharp a bound as alternative methods that can be used if the distribution of the random variable is known. To improve the sharpness of the bounds provided by Chebyshev's inequality a number of methods have been developed.
Standardised variables [
Sharpened bounds can be derived by first standardising the random variable.[25]
Let X be a random variable with finite variance Var(x). Let Z be the standardised form defined as

Cantelli's lemma is then

This inequality is sharp and is attained by k and −1/k with probability 1/(1 + k2) and k2/(1 + k2) respectively.
If k > 1 and the distribution of X is symmetric then we have

Equality holds if and only if Z = −k, 0 or k with probabilities 1 / 2 k2, 1 − 1 / k2 and 1 / 2 k2 respectively.[25] An extension to a two-sided inequality is also possible.
Let u, v > 0. Then we have[25]

Semivariances [
An alternative method of obtaining sharper bounds is through the use of semivariances (partial moments). The upper (σ+2) and lower (σ−2) semivariances are defined


where m is the arithmetic mean of the sample, n is the number of elements in the sample and the sum for the upper (lower) semivariance is taken over the elements greater (less) than the mean.
The variance of the sample is the sum of the two semivariances

In terms of the lower semivariance Chebyshev's inequality can be written[26]

Putting

Chebyshev's inequality can now be written

A similar result can also be derived for the upper semivariance.
If we put

Chebyshev's inequality can be written

Because σu2 ≤ σ2, use of the semivariance sharpens the original inequality.
If the distribution is known to be symmetric, then

and

This result agrees with that derived using standardised variables.
- Note
- The inequality with the lower semivariance has been found to be of use in estimating downside risk in finance and agriculture.[26][27][28]
Selberg's inequality [
Selberg derived an inequality for P(x) when a ≤ x ≤ b.[29] To simplify the notation let

where

and

The result of this linear transformation is to make P(a ≤ X ≤ b) equal to P(|Y| ≤ k).
The mean (μX) and variance (σX) of X are related to the mean (μY) and variance (σY) of Y:


With this notation Selberg's inequality states that



These are known to be the best possible bounds.[30]
Cantelli's inequality [
Cantelli's inequality[31] due to Francesco Paolo Cantelli states that for a real random variable (X) with mean (μ) and variance (σ2)

where a ≥ 0.
This inequality can be used to prove a one tailed variant of Chebyshev's inequality with k > 0[32]

The bound on the one tailed variant is known to be sharp. To see this consider the random variable X that takes the values
with probability 
with probability 
Then E(X) = 0 and E(X2) = σ2 and P(X < 1) = 1 / (1 + σ2).
- An application – distance between the mean and the median
The one-sided variant can be used to prove the proposition that for probability distributions having an expected value and a median, the mean and the median can never differ from each other by more than one standard deviation. To express this in symbols let μ, ν, and σ be respectively the mean, the median, and the standard deviation. Then

There is no need to assume that the variance is finite because this inequality is trivially true if the variance is infinite.
The proof is as follows. Setting k = 1 in the statement for the one-sided inequality gives:

Changing the sign of X and of μ, we get

Thus the median is within one standard deviation of the mean.
For a proof using Jensen's inequality see An inequality relating means and medians.
Bhattacharyya's inequality [
Bhattacharyya[33] extended Cantelli's inequality using the third and fourth moments of the distribution.
Let μ = 0 and σ2 be the variance. Let γ = E(X3) / σ3 and κ = E(X4) / σ4.
If k2 − kγ − 1 > 0 then

The necessity of k2 − kγ − 1 > 0 requires that k be reasonably large.
Mitzenmacher and Upfal's inequality [
Mitzenmacher and Upfal[34] note that
![[ X - E( X ) ]^{ 2k } > 0](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2Fd%2F3%2F1%2Fd313767e79c1cba7e76885d39ef79efd.png&site=wikipedia&host=http://en.wikipedia.org/)
for any real k > 0 and that
![E ( [ X - E( X ) ]^{ 2k } )](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2F2%2Ff%2Fd%2F2fd37261774a8db8e80a7f29d4b9d8a4.png&site=wikipedia&host=http://en.wikipedia.org/)
is the kth central moment. They then show that for t > 0
![P( | X - E( X ) | > t [ E( X - E( X ) )^{ 2k } ]^{ 1 / 2k } ) \le \min\left[ 1, \frac{ 1 }{ t^{ 2k } } \right].](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2Fa%2F1%2F0%2Fa106782b979b68554ae0e73c749a7827.png&site=wikipedia&host=http://en.wikipedia.org/)
For k = 2 we obtain Chebyshev's inequality. For t ≥ 1, k > 2 and assuming that the kth moment exists, this bound is tighter than Chebyshev's inequality.
Related inequalities [
Several other related inequalities are also known.
Zelen's inequality [
Zelen has shown that[35]
![P( X - \mu \ge k \sigma ) \le [ 1 + k^2 + \frac{ ( k^2 + k \theta_3 - 1 )^2 }{ \theta_4 - \theta_3^2 - 1 } ]^{ -1 }](/cgi-bin/wiki-image.mpl?image=%2F%2Fupload.wikimedia.org%2Fmath%2Fc%2F8%2Fa%2Fc8a83b0b8c936d705f7741ce812c2d2b.png&site=wikipedia&host=http://en.wikipedia.org/)
with

and

where Mm is thee Mth moment and σSource: this wikipedia article, under CC-BY-SA.