Zur Kurzanzeige

dc.contributor.authorLee, Ji-Ung
dc.contributor.authorKlie, Jan-Christoph
dc.contributor.authorGurevych, Iryna
dc.date.accessioned2021-06-04T17:24:16Z
dc.date.available2021-06-04T17:24:16Z
dc.date.issued2021
dc.identifier.urihttps://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2783
dc.descriptionAnnotation studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain. This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations; especially in citizen science or crowd sourcing scenarios where domain expertise is not required and only annotation guidelines are provided. To alleviate these issues, we propose annotation curricula, a novel approach to implicitly train annotators. We gradually introduce annotators into the task by ordering instances that are annotated according to a learning curriculum. To do so, we first formalize annotation curricula for sentence- and paragraph-level annotation tasks, define an ordering strategy, and identify well-performing heuristics and interactively trained models on three existing English datasets. We then conduct a user study with 40 voluntary participants who are asked to identify the most fitting misconception for English tweets about the Covid-19 pandemic. Our results show that using a simple heuristic to order instances can already significantly reduce the total annotation time while preserving a high annotation quality. Annotation curricula thus can provide a novel way to improve data collection. To facilitate future research, we further share our code and data consisting of 2,400 annotations.en_US
dc.language.isoenen_US
dc.relation.isreferencedbyhttps://arxiv.org/abs/2106.02382
dc.rightsCreative Commons Attribution 4.0
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectNLPen_US
dc.subjectAnnotation Curriculumen_US
dc.subjectInteractive Learningen_US
dc.subjectSemantic Similarityen_US
dc.subject.classification4.43-06 Datenmanagement, datenintensive Systeme, Informatik-Methoden in der Wirtschaftsinformatiken_US
dc.subject.ddc004
dc.titleAnnotation Curricula to Implicitly Train Non-Expert Annotatorsen_US
dc.typeDataseten_US
dc.typeTexten_US
tud.projectEU/EFRE | 20005482 | TexPrax - Gurevychen_US
tud.projectDFG | GU798/21-1 | Infrastruktur für inen_US
tud.unitTUDa
tud.history.classificationVersion=2020-2024;409-06 Informationssysteme, Prozess- und Wissensmanagement


Dateien zu dieser Ressource

Thumbnail
Thumbnail

Der Datensatz erscheint in:

Zur Kurzanzeige

Creative Commons Attribution 4.0
Solange nicht anders angezeigt, wird die Lizenz wie folgt beschrieben: Creative Commons Attribution 4.0