# Zipf-Mandelbrot law

Parameters Probability mass function Cumulative distribution function ${\displaystyle N\in \{1,2,3\ldots \}}$ (integer)${\displaystyle q\in [0;\infty )}$ (real)${\displaystyle s>0\,}$ (real) ${\displaystyle k\in \{1,2,\ldots ,N\}}$ ${\displaystyle {\frac {1/(k+q)^{s}}{H_{N,q,s}}}}$ ${\displaystyle {\frac {H_{k,q,s}}{H_{N,q,s}}}}$ ${\displaystyle {\frac {H_{N,q,s-1}}{H_{N,q,s}}}-q}$ ${\displaystyle 1\,}$

In probability theory and statistics, the Zipf-Mandelbrot law is a discrete probability distribution. Also known as the Pareto-Zipf law, it is a power-law distribution on ranked data, named after the Harvard linguistics professor George Kingsley Zipf (1902-1950) who suggested a simpler distribution called Zipf's law, and the mathematician Benoît Mandelbrot (born November 20, 1924), who subsequently generalized it.

The probability mass function is given by:

${\displaystyle f(k;N,q,s)={\frac {1/(k+q)^{s}}{H_{N,q,s}}}}$

where ${\displaystyle H_{N,q,s}}$ is given by:

${\displaystyle H_{N,q,s}=\sum _{i=1}^{N}{\frac {1}{(i+q)^{s}}}}$

which may be thought of as a generalization of a harmonic number. In the limit as ${\displaystyle N}$ approaches infinity, this becomes the Hurwitz zeta function ${\displaystyle \zeta (q,s)}$. For finite ${\displaystyle N}$ and ${\displaystyle q=0}$ the Zipf-Mandelbrot law becomes Zipf's law. For infinite ${\displaystyle N}$ and ${\displaystyle q=0}$ it becomes a Zeta distribution.

## Applications

The distribution of words ranked by their frequency in a random corpus of writing is generally a power-law distribution, known as Zipf's law.

If one plots the frequency rank of words contained in a large corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Gelbukh and Sidorov 2001).