TUdatalib : Lessons Learned from a Citizen Science Project for Natural Language Processing

Am Montag, 7.4.2025 wird TUdatalib wegen geplanten Wartungsarbeiten am Speichersystem von 9:00 bis voraussichtlich 9:30 nur eingeschränkt nutzbar sein (kein Datenupload und Download) | Due to scheduled maintenance on the storage system, using TUdatalib will be limited on Monday, April 7 2025 from 9:00 to approx. 9:30 (no data upload or download)

Zur Kurzanzeige

dc.contributor.author	Klie, Jan-Christoph
dc.contributor.author	Lee, Ji-Ung
dc.contributor.author	Stowe, Kevin
dc.contributor.author	Sahin, Gözde Gül
dc.contributor.author	Moosavi, Nafise Sadat
dc.contributor.author	Bates, Luke
dc.contributor.author	Dominic, Petrak
dc.contributor.author	Eckart de Castilho, Richard
dc.contributor.author	Gurevych, Iryna
dc.date.accessioned	2023-09-08T14:20:46Z
dc.date.available	2023-09-08T14:20:46Z
dc.date.issued	2023-05
dc.identifier.uri	https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3942
dc.description	This is the accompanying data for our paper "Lessons Learned from a Citizen Science Project for Natural Language Processing". Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often outsourced to paid crowdworkers. Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP. To investigate whether and how well Citizen Science can be applied in this setting, we conduct an exploratory study into engaging different groups of volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing crowdsourced dataset. Our results show that this can yield high-quality annotations and at- tract motivated volunteers, but also requires considering factors such as scalability, participation over time, and legal and ethical issues. We summarize lessons learned in the form of guidelines and provide our code and data to aid future work on Citizen Science.	de_DE
dc.relation	IsSupplementTo;URL;https://aclanthology.org/2023.eacl-main.261/
dc.rights	Creative Commons Attribution-NonCommercial 4.0
dc.rights.uri	https://creativecommons.org/licenses/by-nc/4.0/
dc.subject	citizen science	de_DE
dc.subject	annotation	de_DE
dc.subject	nlp	de_DE
dc.subject.classification	4.43-04 Künstliche Intelligenz und Maschinelle Lernverfahren	de_DE
dc.subject.classification	4.43-05 Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing
dc.subject.ddc	004
dc.title	Lessons Learned from a Citizen Science Project for Natural Language Processing	de_DE
dc.type	Dataset	de_DE
tud.unit	TUDa
tud.history.classification	Version=2016-2020;409-05 Interaktive und intelligente Systeme, Bild- und Sprachverarbeitung, Computergraphik und Visualisierung

Dateien zu dieser Ressource

Name:: citizen-tudatalib.zip
Größe:: 76.71MB
Format:: application/zip

Anzahl der Dateien

Name:: license_CC-BY-NC-4.0.rdf
Größe:: 9.525KB
Format:: application/rdf+xml

Anzahl der Dateien

Der Datensatz erscheint in:

Zur Kurzanzeige

Solange nicht anders angezeigt, wird die Lizenz wie folgt beschrieben: Creative Commons Attribution-NonCommercial 4.0