TUdatalib : Lessons Learned from a Citizen Science Project for Natural Language Processing

Am Montag, 7.4.2025 wird TUdatalib wegen geplanten Wartungsarbeiten am Speichersystem von 9:00 bis voraussichtlich 9:30 nur eingeschränkt nutzbar sein (kein Datenupload und Download) | Due to scheduled maintenance on the storage system, using TUdatalib will be limited on Monday, April 7 2025 from 9:00 to approx. 9:30 (no data upload or download)

Count of file(s): 1

citizen-tudatalib.zip (76.71MB)

Date

2023-05

Author

Moosavi, Nafise Sadat

Bates, Luke

Dominic, Petrak

Eckart de Castilho, Richard

Gurevych, Iryna

Type

Dataset

Description

This is the accompanying data for our paper "Lessons Learned from a Citizen Science Project for Natural Language Processing". Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often outsourced to paid crowdworkers. Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP. To investigate whether and how well Citizen Science can be applied in this setting, we conduct an exploratory study into engaging different groups of volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing crowdsourced dataset. Our results show that this can yield high-quality annotations and at- tract motivated volunteers, but also requires considering factors such as scalability, participation over time, and legal and ethical issues. We summarize lessons learned in the form of guidelines and provide our code and data to aid future work on Citizen Science.

Collections

The following license files are associated with this item:

License description

Except where otherwise noted, this item's license is described as Creative Commons Attribution-NonCommercial 4.0

Lessons Learned from a Citizen Science Project for Natural Language Processing

Count of file(s): 1

Date

Author

Type

Metadata

Export

Description

Subject

DFG subject classification

URI

Related Resources

Collections