Downloadable Linguistic Data
- Modern Russian Frequency List–
Serge Sharoff’s word frequency list based on a selection of works
in modern Russian
- Zaliznayak’s Morphological Dictionary–
This file is the basis for many implementations of Russian morphology.
It contains the word entries from an early edition of Zaliznyak’s
morphological dictionary, though some of the unusual characters used
in the printed dictionary have been replaced with ASCII substitutes.
It does not include the information needed to interpret these entries.
For that you must refer to printed dictionary or scans.)
Though this is ostensibly a text file, it is stored in what is now a very
- It is an installer which runs under Microsoft Windows to install the text files.
- The files it creates are encoded in the ALT encoding for MS-DOS which is now little used.
- The already obscure ALT encoding is altered to provide accented vowels
at code points usually used for graphic characters
- The files contain typographical errors from the OCR process
- At the end of each line a control-d character has been added followed by what is appearently
intended as an English translation of the word, but these are often
For these reasons you would likely do better to use Odict.
An expanded version of Zaliznyak’s dictionary which (as of April
2019) is being regularly updated. Users can contribute new word entries
using a web interface. Includes documentation of the format taken from
the forward to Zaliznayak’s dictionary and adapted:
A Russian corpus with ambiguity resolved so that the identity of each
word and its morphological form is known
- OPUS–an Open Source Parellel Corpus–
Translated texts sentence aligned. Can be searched on this and other
sites. Datasets can be downloaded so that you can use them with your
own tools. Texts come from sources including UN documents,
government publications, Wikipedia, movie subtitles, and multilingual
Programs for Using Linguistic Data
–Morphological analyzer and inflection engine for the Russian
and Ukrainian languages. Is of good quality, fast, but does not
provide information about stressed syllables.
–A program written in Perl which takes the entries in Zalizyak’s
dictionary and produces the full paradigm of each word. Unfortunately
the author has not posted all of the files, so the program does not really
work. The output files are posted though and these may be useful.