## Computing the weight of a sequence in a text

The weight that the repetitiveness checker assigns to a sequence is high if
there are more occurences of the sequence than expected on the basis of the
assumption that a text is an incoherent list of words. In this way, the weight
tells something about the tendency of words in text to be together with their
friends, even though there may not be many friends in the text and it would not
be expected a priori that they would be together.

This is how the repetitiveness checker computes the probability that a
candidate sequence occurs a certain number of times:

${P}_{n}^{m}({x}_{1}...{x}_{l})=\frac{\prod _{j=1}^{m}(n-jl+1)}{m!}{\left({P}_{{x}_{1}...{x}_{l}}\right)}^{m}.{(1-{P}_{{x}_{1}...{x}_{l}})}^{n-ml}$

In this formula,
${P}_{n}^{m}({x}_{1}...{x}_{l})$
is the probability that a sequence of
$l$
words
${x}_{1}...{x}_{l}$
occurs
$m$
times in a text with
$n$
tokens, whereas
${P}_{{x}_{1}...{x}_{l}}$
is the probability that the sequence of words
${x}_{1}...{x}_{l}$
occurs at a given position. This probability is simply the product of
probabilities of the words in the sequence:
${P}_{{x}_{1}...{x}_{l}}=\prod _{i=1}^{l}{P}_{{x}_{\mathrm{i}}}$
, where
${P}_{{x}_{\mathrm{i}}}$
is the probability that a word at a given position is the word
${x}_{\mathrm{i}}$
.