Hinweis
Dies ist nicht die aktuellste Version dieses Datensatzes. Die aktuellste Version finden Sie unter: https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/1915.2
DBS Corpus
dc.contributor.author | Benikova, Darina | |
dc.contributor.author | Mieskes, Margot | |
dc.contributor.author | Meyer, Christian M. | |
dc.contributor.author | Gurevych, Iryna | |
dc.date.accessioned | 2019-02-18T09:28:24Z | |
dc.date.available | 2019-02-18T09:28:24Z | |
dc.date.issued | 2016-11-28 | |
dc.identifier.uri | https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/1915 | |
dc.description | The DBS corpus contains 93 multi-document summaries for 293 German documents about 30 education-related topics. We sampled the topics from the Deutscher Bildungsserver (DBS) webpage and crawled the documents linked there. The documents are highly heterogeneous in terms of text type, genre, and style. The multi-document summaries are the result of a seven step annotation process yielding coherent extracts – a novel type of summary that is based on phrases extracted from the original documents that have been ordered and minimally redacted to form a well-readable, coherent text. The data of all intermediate steps is part of the repository to allow for extensive system evaluation. If you use the corpus in academic works, please cite our COLING paper. | en_US |
dc.language.iso | de | en_US |
dc.relation | IsDescribedBy;URL;https://aclweb.org/anthology/C16-1099 | |
dc.relation | References;URL;https://www.bildungsserver.de/ | |
dc.relation | IsVersionOf;URL;https://github.com/AIPHES/DBS | |
dc.rights | Creative Commons Attribution Share-Alike 4.0 | |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ | |
dc.subject | Multi-document Summarization | en_US |
dc.subject | Heterogeneous Sources | en_US |
dc.subject | Information Aggregation | en_US |
dc.subject | Natural Language Processing | en_US |
dc.subject | AIPHES | en_US |
dc.subject.ddc | 000 Informatik, Informationswissenschaft, allgemeine Werke | en_US |
dc.subject.ddc | 430 Germanische Sprachen; Deutsch | en_US |
dc.title | DBS Corpus | en_US |
dc.type | Dataset | en_US |
dc.type | Text | en_US |
dc.type | Workflow | en_US |
dc.description.version | 1.0 | |
tud.tubiblio | 97945 |
Dateien zu dieser Ressource
Der Datensatz erscheint in:
-
Summarization [4]
Summarization Corpora and Tools