-----------------------------------------------------------
-----  Readme.txt for the Enron Crowdsourced Dataset ------

This file is the Readme file for the Enron Crowdsourced Dataset, a dataset of emails extracted from the Enron Emails Corpus which have been annotated in pairs on Amazon Mechanical Turk.

This dataset is described in the paper,

@inproceedings{	TUD-CS-2014-0991,
	author = {Emily Jamison and Iryna Gurevych},
	title = {Needle in a Haystack: Reducing the Costs of Annotating Rare-Class Instances
in Imbalanced Datasets},
	year = {2014},
	address = {Phuket,Thailand},
	booktitle = {Proceedings of the 28th Pacific Asia Conference on Language, Information
and Computing},
	pages = {244--253},
}

For each of 5 pairs of emails per HIT, the Turkers were asked to annotate, 
"Are these two emails part of the same email thread?  Please look at each of the pairs of emails and determine whether or not they are part of the same email thread/discussion."  Answer, via radio button: Yes/Can'tTell/No.
In the .results file, labels are in the format:
yes-t407604e324a407604e326
This means "yes" label for {thread 407604 email 324} and {thread 407604 email 326}.  We can determine that the known gold standard for this pair is positive, because thread 407604 is the same for both emails.  One known-answer pair was embedded in the hits, and was annotated many more times than the others.


Processing:
We have anonymized worker id's (but retained a 1:1 correspondance between original id's and anonymized id's).  
Also, we have redacted:
"feedback": the message we sent a worker when accepting or rejecting the HIT
"Answer.TxtareaInput": the message the worker sent us with the HIT


Copyright: 
This dataset is released by UKP Lab, TU Darmstadt under the Creative Commons Attribution/Share-Alike License (CC-BY-SA).  UKP Lab ownership of the data originates from Amazon Mechanical Turk's Conditions of Use:
"[...] all ownership rights, including worldwide intellectual property rights, will vest with the Requester immediately upon [Turker's] performance of the Service. To the extent any such rights do not vest in Requester under applicable law, [Turker] hereby assign or exclusively grant (without the right to any compensation) all right, title and interest, including all intellectual property rights, to such work product to Requester."
https://www.mturk.com/mturk/conditionsofuse

