<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="pmathml.xsl"?>
<!-- Next line to please Opera browser -->
<?xml-stylesheet type="text/css" href="pmathmlcss.css"?>
<!--
  pref:renderer="techexplorer-plugin"
  pref:renderer="techexplorer"
  pref:renderer="css"
  pref:renderer="mathplayer"
  pref:renderer="mathplayer-dl"
-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html" />
  <title>Presentation Examples</title>
</head>

<body>
<h2>Computing the weight of a sequence in a text</h2>

<p> The weight that the repetitiveness checker assigns to a sequence is high if
there are more occurences of the sequence than expected on the basis of the
assumption that a text is an incoherent list of words. In this way, the weight
tells something about the tendency of words in text to be together with their
friends, even though there may not be many friends in the text and it would not
be expected a priori that they would be together.
</p>

<p>This is how the repetitiveness checker computes the probability that a
candidate sequence occurs a certain number of times:</p>

<p>
<math xmlns="http://www.w3.org/1998/Math/MathML">
  <msubsup>
    <mi>P</mi>
    <mi>n</mi>
    <mi>m</mi>
  </msubsup>

  <mrow>
    <mo>(</mo>
    <msub>
      <mi>x</mi>
      <mn>1</mn>
    </msub>
    <mo>.</mo>

    <mo>.</mo>
    <mo>.</mo>
    <msub>
      <mi>x</mi>
      <mi>l</mi>
    </msub>
    <mo>)</mo>

  </mrow>
  <mo>=</mo>
  <mfrac>
    <mrow>
      <munderover>
        <mo>&#x0220F;</mo>
        <mrow>
          <mi>j</mi>

          <mo>=</mo>
          <mn>1</mn>
        </mrow>
        <mi>m</mi>
      </munderover>
      <mrow>
        <mo>(</mo>

        <mi>n</mi>
        <mo>&#x2212;</mo>
        <mi>j</mi>
        <mi>l</mi>
        <mo>+</mo>
        <mn>1</mn>

        <mo>)</mo>
      </mrow>
    </mrow>
    <mrow>
      <mi>m</mi>
      <mo>!</mo>
    </mrow>

  </mfrac>
  <msup>
    <mrow>
      <mo>(</mo>
      <msub>
        <mi>P</mi>
        <mrow>
          <msub>

            <mi>x</mi>
            <mn>1</mn>
          </msub>
          <mo>.</mo>
          <mo>.</mo>
          <mo>.</mo>

          <msub>
            <mi>x</mi>
            <mi>l</mi>
          </msub>
        </mrow>
      </msub>
      <mo>)</mo>

    </mrow>
    <mi>m</mi>
  </msup>
  <mo>.</mo>
  <msup>
    <mrow>
      <mo>(</mo>

      <mn>1</mn>
      <mo>&#x2212;</mo>
      <msub>
        <mi>P</mi>
        <mrow>
          <msub>
            <mi>x</mi>

            <mn>1</mn>
          </msub>
          <mo>.</mo>
          <mo>.</mo>
          <mo>.</mo>
          <msub>
            <mi>x</mi>

            <mi>l</mi>
          </msub>
        </mrow>
      </msub>
      <mo>)</mo>
    </mrow>
    <mrow>
      <mi>n</mi>

      <mo>&#x2212;</mo>
      <mi>m</mi>
      <mi>l</mi>
    </mrow>
  </msup>
</math> </p>

<p>In this formula, 
<math xmlns="http://www.w3.org/1998/Math/MathML">
  <msubsup>
    <mi>P</mi>
    <mi>n</mi>
    <mi>m</mi>
  </msubsup>

  <mrow>
    <mo>(</mo>
    <msub>
      <mi>x</mi>
      <mn>1</mn>
    </msub>
    <mo>.</mo>

    <mo>.</mo>
    <mo>.</mo>
    <msub>
      <mi>x</mi>
      <mi>l</mi>
    </msub>
    <mo>)</mo>

  </mrow>
</math>
is the probability that a sequence of 
<math xmlns="http://www.w3.org/1998/Math/MathML">
      <mi>l</mi>
</math>
words 
<math xmlns="http://www.w3.org/1998/Math/MathML">
    <msub>
      <mi>x</mi>
      <mn>1</mn>
    </msub>
    <mo>.</mo>

    <mo>.</mo>
    <mo>.</mo>
    <msub>
      <mi>x</mi>
      <mi>l</mi>
    </msub>
</math>

occurs 
<math xmlns="http://www.w3.org/1998/Math/MathML">
      <mi>m</mi>
</math>

times in a text with 

<math xmlns="http://www.w3.org/1998/Math/MathML">
      <mi>n</mi>
</math>
 tokens, whereas
<math xmlns="http://www.w3.org/1998/Math/MathML">
      <msub>
        <mi>P</mi>
        <mrow>
          <msub>

            <mi>x</mi>
            <mn>1</mn>
          </msub>
          <mo>.</mo>
          <mo>.</mo>
          <mo>.</mo>

          <msub>
            <mi>x</mi>
            <mi>l</mi>
          </msub>
        </mrow>
      </msub>
</math>
is the probability that the sequence of words
<math xmlns="http://www.w3.org/1998/Math/MathML">
    <msub>
      <mi>x</mi>
      <mn>1</mn>
    </msub>
    <mo>.</mo>

    <mo>.</mo>
    <mo>.</mo>
    <msub>
      <mi>x</mi>
      <mi>l</mi>
    </msub>
</math>
occurs at a given position. This probability is simply the product of
probabilities of the words in the sequence:
<math xmlns="http://www.w3.org/1998/Math/MathML">

  <mrow>

      <msub>
        <mi>P</mi>
        <mrow>
          <msub>

            <mi>x</mi>
            <mn>1</mn>
          </msub>
          <mo>.</mo>
          <mo>.</mo>
          <mo>.</mo>

          <msub>
            <mi>x</mi>
            <mi>l</mi>
          </msub>
        </mrow>
      </msub>
  </mrow>
  <mo>=</mo>
    <mrow>
      <munderover>
        <mo>&#x0220F;</mo>
        <mrow>
          <mi>i</mi>

          <mo>=</mo>
          <mn>1</mn>
        </mrow>
        <mi>l</mi>
      </munderover>
      <mrow>
      <msub>
        <mi>P</mi>
        <mrow>
          <msub>

            <mi>x</mi>
            <mn>i</mn>
          </msub>
        </mrow>
      </msub>
      </mrow>
    </mrow>
</math>
, where 
<math xmlns="http://www.w3.org/1998/Math/MathML">
      <msub>
        <mi>P</mi>
        <mrow>
          <msub>

            <mi>x</mi>
            <mn>i</mn>
          </msub>
        </mrow>
      </msub>
</math>
is the probability that a word at a given position is the word 
<math xmlns="http://www.w3.org/1998/Math/MathML">
          <msub>

            <mi>x</mi>
            <mn>i</mn>
          </msub>
</math>
.</p>

</body>
</html>
