TUdatalib : Medical Concept Embeddings via Labeled Background Corpora

Zur Kurzanzeige

dc.contributor.author	Loza Mencia, Eneldo
dc.contributor.author	de Melo, Gerard
dc.contributor.author	Nam, Jinseok
dc.date.accessioned	2021-09-26T20:59:54Z
dc.date.available	2021-09-26T20:59:54Z
dc.date.issued	2016
dc.identifier.uri	https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2936
dc.description	This entry contains the resources used in and resulting from Eneldo Loza Mencía, Gerard de Melo and Jinseok Nam, Medical Concept Embeddings via Labeled Background Corpora, in: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), 2016 In recent years, we have seen an increasing amount of interest in low-dimensional vector representations of words. Among other things, these facilitate computing word similarity and relatedness scores. The most well-known example of algorithms to produce representations of this sort are the word2vec approaches. In this paper, we investigate a new model to induce such vector spaces for medical concepts, based on a joint objective that exploits not only word co-occurrences but also manually labeled documents, as available from sources such as PubMed. Our extensive experimental analysis shows that our embeddings lead to significantly higher correlations with human similarity and relatedness assessments than previous work. Due to the simplicity and versatility of vector representations, these findings suggest that our resource can easily be used as a drop-in replacement to improve any systems relying on medical concept similarity measures.	de_DE
dc.language.iso	en	de_DE
dc.relation	IsVersionOf;URL;https://www.ke.tu-darmstadt.de/resources/medsim
dc.relation	IsSupplementTo;ISBN;978-2-9517408-9-1
dc.rights.uri	https://rightsstatements.org/vocab/InC/1.0/
dc.subject	Embeddings	de_DE
dc.subject	Medical Concepts	de_DE
dc.subject	Semantic Similarity	de_DE
dc.subject	MeSH	de_DE
dc.subject.classification	1.14-03 Angewandte Sprachwissenschaften, Computerlinguistik	de_DE
dc.subject.classification	4.43-04 Künstliche Intelligenz und Maschinelle Lernverfahren	de_DE
dc.subject.classification	4.43-05 Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing
dc.subject.ddc	400
dc.subject.ddc	004
dc.title	Medical Concept Embeddings via Labeled Background Corpora	de_DE
dc.type	Dataset	de_DE
dc.type	Text	de_DE
dc.type	Model	de_DE
dc.description.version	Version 1.0	de_DE
tud.unit	TUDa
tud.history.classification	Version=2016-2020;409-05 Interaktive und intelligente Systeme, Bild- und Sprachverarbeitung, Computergraphik und Visualisierung
tud.history.classification	Version=2020-2024;104-04 Angewandte Sprachwissenschaften, Experimentelle Linguistik, Computerlinguistik