A cross linguistic database of children's printed words in three Slavic languages
We describe a lexical database consisting of morphologically and phonetically tagged words that occur in the texts primarily used for language arts instruction in the Czech Republic, Poland and Slovakia in the initial period of primary education (up to grade 4 or 5). The database aims to parallel the contents and usage of the British English Children's Printed Word Database. It contains words from texts of the most widely used Czech, Polish and Slovak textbooks. The corpus is accessible via a simple WWW interface, allowing regular expression searches and boolean expression across word forms, lemmas, morphology tags and phonemic transcription, and providing useful statistics on the textwords included. We anticipate extensive usage of the database as a reference in the developmentof psychodiagnostic batteries for literacy impairments in the three languages, as well as for the creation of experimental materials in psycholinguistic research.
Language arts instruction , Slavic languages , Primary education , Psycholinguistic research
Garabík, R., Caravolas, M., Kessler, B., Höflerová, E., Masterson, J., Mikulajová, M., Szczerbiński, M., Wierzchoń, P. (2007). 'A cross-linguistic database of children’s printed words in three Slavic languages'. In Levická, J., & Garabík, R. (Eds.). Computer Treatment of Slavic and East European Languages: Fourth International Seminar, Bratislava, Slovakia, 25−27 October 2007: Proceedings (pp. 51−64). Bratislava: Tribun.