Studie & práce


Lexicographers' Calculator


The program is based on the algorithms described by Jiří Milička in his article Rank-frequency Relation and Type-token Relation: Two Sides of the Same Coin. The algorithms are valid only for big corpora (the number of tokens should exceed 100,000,000 )

The application helps you to plan the size of a new corpus.

  1. Specify the number of tokens (or "positions") in your corpus / corpora
  2. Specify the number of types (wordforms, lemmas) in your corpus / corpora
  3. Specify the number of hapax legomena (ie. types, that occur only once) in your corpus / corpora
  4. Fill arbitrary 2 of the 3 following values:
    1. The number of tokens in your planned corpus, if you need b) types of frequency c)
    2. The number or types of frequency c) in the corpus of size a)
    3. The frequency of b) types in a corpus that contains a) tokens
  5. The remainning value will be calculated after pushing the main button.

The algorithm is quite greedy so please be patient (the progress is not visible).