Zur Kurzanzeige

dc.contributor.authorKlie, Jan-Christoph
dc.contributor.authorLee, Ji-Ung
dc.contributor.authorStowe, Kevin
dc.contributor.authorSahin, Gözde Gül
dc.contributor.authorMoosavi, Nafise Sadat
dc.contributor.authorBates, Luke
dc.contributor.authorDominic, Petrak
dc.contributor.authorEckart de Castilho, Richard
dc.contributor.authorGurevych, Iryna
dc.date.accessioned2023-09-08T14:20:46Z
dc.date.available2023-09-08T14:20:46Z
dc.date.issued2023-05
dc.identifier.urihttps://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3942
dc.descriptionThis is the accompanying data for our paper "Lessons Learned from a Citizen Science Project for Natural Language Processing". Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often outsourced to paid crowdworkers. Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP. To investigate whether and how well Citizen Science can be applied in this setting, we conduct an exploratory study into engaging different groups of volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing crowdsourced dataset. Our results show that this can yield high-quality annotations and at- tract motivated volunteers, but also requires considering factors such as scalability, participation over time, and legal and ethical issues. We summarize lessons learned in the form of guidelines and provide our code and data to aid future work on Citizen Science.de_DE
dc.relationIsSupplementTo;URL;https://aclanthology.org/2023.eacl-main.261/
dc.rightsCreative Commons Attribution-NonCommercial 4.0
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/
dc.subjectcitizen sciencede_DE
dc.subjectannotationde_DE
dc.subjectnlpde_DE
dc.subject.classification4.43-04 Künstliche Intelligenz und Maschinelle Lernverfahrende_DE
dc.subject.classification4.43-05 Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing
dc.subject.ddc004
dc.titleLessons Learned from a Citizen Science Project for Natural Language Processingde_DE
dc.typeDatasetde_DE
tud.unitTUDa
tud.history.classificationVersion=2016-2020;409-05 Interaktive und intelligente Systeme, Bild- und Sprachverarbeitung, Computergraphik und Visualisierung


Dateien zu dieser Ressource

Thumbnail
Thumbnail

Der Datensatz erscheint in:

Zur Kurzanzeige

Creative Commons Attribution-NonCommercial 4.0
Solange nicht anders angezeigt, wird die Lizenz wie folgt beschrieben: Creative Commons Attribution-NonCommercial 4.0