Der Login über E-Mail und Passwort wird in Kürze abgeschaltet. Für Externe steht ab sofort der Login über ORCID zur Verfügung.
The login via e-mail and password will be retired in the near future. External uses can login via ORCID from now on.
 

EUR-Lex Dataset

datacite.relation.isSupplementTo https://doi.org/10.1007/978-3-540-87481-2_4
datacite.relation.isSupplementTo https://doi.org/10.1007/978-3-642-12837-0_11
dc.contributor.author Loza Mencia, Eneldo
dc.contributor.author Fürnkranz, Johannes
dc.contributor.author loza
dc.date.accessioned 2021-09-27T00:16:16Z
dc.date.available 2021-09-27T00:16:16Z
dc.date.created 2010
dc.date.issued 2021-09-27
dc.description The EUR-Lex text collection is a collection of documents about European Union law. It contains many different types of documents, including treaties, legislation, case-law and legislative proposals, which are indexed according to several orthogonal categorization schemes to allow for multiple search facilities. The most important categorization is provided by the EUROVOC descriptors, which form a topic hierarchy with almost 4000 categories regarding different aspects of European law. This document collection provides an excellent opportunity to study text classification techniques for several reasons: - it contains multiple classifications of the same documents, making it possible to analyze the effects of different classification properties using the same underlying reference data without resorting to artificial or manipulated classifications, - the overwhelming number of produced documents make the legal domain a very attractive field for employing supportive automated solutions and therefore a machine learning scenario in step with actual practice, - the documents are available in several European languages and are hence very interesting e.g. for the wide field of multi- and cross-lingual text classification, - and, finally, the data is freely accessible (at http://eur-lex.europa.eu/) The database constitutes a very challenging multilabel scenario due to the high number of possible labels (up to 4000). A first step towards analyzing this database was done by applying multilabel classification techniques on three of its categorization schemes in the following work: Eneldo Loza Mencía and Johannes Fürnkranz. Efficient multilabel classification algorithms for large-scale problems in the legal domain. In Semantic Processing of Legal Texts, pages 192-215, Springer-Verlag, 2010 http://www.ke.tu-darmstadt.de/publications/papers/loza10eurlex.pdf de_DE
dc.description.version Version 2010 de_DE
dc.identifier.uri https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2937
dc.language.iso en de_DE
dc.rights.licenseIn Copyright (https://rightsstatements.org/vocab/InC/1.0/)
dc.subject Multi-label classification de_DE
dc.subject EUR-Lex de_DE
dc.subject EUROVOC de_DE
dc.subject.classification 4.43-04
dc.subject.classification 4.43-05
dc.subject.ddc 004
dc.title EUR-Lex Dataset de_DE
dc.type Dataset de_DE
dc.type Text de_DE
dcterms.accessRights openAccess
person.identifier.orcid #PLACEHOLDER_PARENT_METADATA_VALUE#
person.identifier.orcid 0000-0002-1207-0159
person.identifier.orcid #PLACEHOLDER_PARENT_METADATA_VALUE#
tuda.history.classification Version=2016-2020;409-05 Interaktive und intelligente Systeme, Bild- und Sprachverarbeitung, Computergraphik und Visualisierung
tuda.unit TUDa

Files

Original bundle

Now showing 1 - 20 of 26
NameDescriptionSizeFormat
readme.html23.94 KBHypertext Markup Language Download
dia_eurlex_labelsetsizes.gif32.63 KBGraphics Interchange Format Download
dia_eurlex_labelsizes.gif37.83 KBGraphics Interchange Format Download
dia_eurlex_labelsizesfreq.gif39.9 KBGraphics Interchange Format Download
eurovoc_graph.jpg50.37 KBJoint Photographic Experts Group/JPEG File Interchange Format (JFIF) Download
Convert2MulanArff.class8.82 KBUnknown data format Download
Convert2MulanArff.java8.47 KBUnknown data format Download
DocumentFrequencyAttributeEval.class4.26 KBUnknown data format Download
DocumentFrequencyAttributeEval.java7.25 KBUnknown data format Download
english.stop3.5 KBUnknown data format Download
eurlex_CV10.zip268.36 MBZIP-Archivdateien Download
eurlex_dc_nA-5k_CV10.mulan.zip217.44 MBZIP-Archivdateien Download
eurlex_dc_tokenstring.mulan.arff.gz35.91 MB Download
eurlex_download_EN_NOT.sh.gz116.56 KB Download
eurlex_dc-all_nA-5k_CV10.zip218.5 MBZIP-Archivdateien Download
eurlex_ev_nA-5k_CV10.mulan.zip220.61 MBZIP-Archivdateien Download
eurlex_ev_tokenstring.mulan.arff.gz38.48 MB Download
eurlex_ID_mappings.csv.gz156.5 KB Download
eurlex_html_EN_NOT.zip160.03 MBZIP-Archivdateien Download
eurlex_id2class.zip1.5 MBZIP-Archivdateien Download

Collections