This folder contains the spelling errors that have been extracted from the Merlin corpus.

### MERLIN CORPUS ###

http://www.merlin-platform.eu/

Abel, Andrea; Wisniewski, Katrin; Nicolas, Lionel; Boyd, Adriane; Hana, Jirka; Meurers, Detmar (2014): A Trilingual Learner Corpus illustrating European Reference Levels. In: Ricognizioni – Rivista di Lingue, Letterature e Culture Moderne 2 (1), 111-126.  

Katrin Wisniewski, Karin Schöne, Lionel Nicolas, Chiara Vettori, Adriane Boyd, Detmar Meurers, Andrea Abel, Jirka Hana. MERLIN: An online trilingual learner corpus empirically grounding the European Reference Levels in authentic learner data. Proceedings of the Conference ICT for Language Learning 2013, Florence, Italy, November 14-15, 2013.

### SPELLING ERROR EXTRACTION ###
If you use the spelling errors in your work, please also cite:
Lisa Beinborn, Torsten Zesch, Iryna Gurevych. Predicting the Spelling Difficulty of Words for Language Learners. In:Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications held in conjunction with NAACL 2016, p. to appear, 2016. 

The folder "DE" contains spelling errors that have been extracted from learner essays in German; "IT" contains spelling errors that have been extracted from learner essays in Italian.

For each language, the file "spellingErrors.csv" contains all errors in the corpus that have been annotated with the tag "O_graph". The files "filteredSpellingErrors.csv", "spellingErrorProbabilities.csv" and "tokenfrequency_all.dist" have been obtained after pre-processing the data as described in Beinborn et al. 2016.

The folder "TrainTestData" contains the training and test data for the prediction experiments. 

We would be happy, if you let us know when using the spelling errors for further research:
beinborn [at] ukp.informatik.tu-darmstadt.de

