Der Login über E-Mail und Passwort wird in Kürze abgeschaltet. Für Externe steht ab sofort der Login über ORCID zur Verfügung.
The login via e-mail and password will be retired in the near future. External uses can login via ORCID from now on.
 

Med­i­cal Con­cept Em­bed­dings via La­beled Back­ground Cor­po­ra

datacite.relation.isSupplementTo ISBN/978-2-9517408-9-1
datacite.relation.isVersionOf https://www.ke.tu-darmstadt.de/resources/medsim
dc.contributor.author Loza Mencia, Eneldo
dc.contributor.author de Melo, Gerard
dc.contributor.author Nam, Jinseok
dc.date.accessioned 2021-09-26T20:59:54Z
dc.date.available 2021-09-26T20:59:54Z
dc.date.created 2016
dc.date.issued 2021-09-26
dc.description This entry contains the resources used in and resulting from Eneldo Loza Mencía, Gerard de Melo and Jinseok Nam, Medical Concept Embeddings via Labeled Background Corpora, in: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), 2016 In recent years, we have seen an increasing amount of interest in low-dimensional vector representations of words. Among other things, these facilitate computing word similarity and relatedness scores. The most well-known example of algorithms to produce representations of this sort are the word2vec approaches. In this paper, we investigate a new model to induce such vector spaces for medical concepts, based on a joint objective that exploits not only word co-occurrences but also manually labeled documents, as available from sources such as PubMed. Our extensive experimental analysis shows that our embeddings lead to significantly higher correlations with human similarity and relatedness assessments than previous work. Due to the simplicity and versatility of vector representations, these findings suggest that our resource can easily be used as a drop-in replacement to improve any systems relying on medical concept similarity measures. de_DE
dc.description.version Version 1.0 de_DE
dc.identifier.uri https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2936
dc.language.iso en de_DE
dc.rights.licenseIn Copyright (https://rightsstatements.org/vocab/InC/1.0/)
dc.subject Embeddings de_DE
dc.subject Medical Concepts de_DE
dc.subject Semantic Similarity de_DE
dc.subject MeSH de_DE
dc.subject.classification 1.14-03
dc.subject.classification 4.43-04
dc.subject.classification 4.43-05
dc.subject.ddc 400
dc.subject.ddc 004
dc.title Med­i­cal Con­cept Em­bed­dings via La­beled Back­ground Cor­po­ra de_DE
dc.type Dataset de_DE
dc.type Text de_DE
dc.type Model de_DE
dcterms.accessRights openAccess
person.identifier.orcid #PLACEHOLDER_PARENT_METADATA_VALUE#
person.identifier.orcid #PLACEHOLDER_PARENT_METADATA_VALUE#
person.identifier.orcid #PLACEHOLDER_PARENT_METADATA_VALUE#
tuda.history.classification Version=2016-2020;409-05 Interaktive und intelligente Systeme, Bild- und Sprachverarbeitung, Computergraphik und Visualisierung
tuda.history.classification Version=2020-2024;104-04 Angewandte Sprachwissenschaften, Experimentelle Linguistik, Computerlinguistik
tuda.unit TUDa

Files

Original bundle

Now showing 1 - 6 of 6
NameDescriptionSizeFormat
BioASQ_train_full_no_desc.vectors4.52 GBUnknown data format Download
readme.txt4.76 KBPlain Text Download
ComputeEmbeddingsSimilaritiesForPairs.zip32.06 MBZIP-Archivdateien Download
seen_label_vocabulary.txt.gz244.29 KB Download
MeSH_name_id_mapping_2015.txt.gz249.38 KB Download
word_vocabulary.txt.gz29.22 MB Download

Collections