TUdatalib : BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)

dc.creator	Heinzerling, Benjamin
dc.date	2019-02-06
dc.identifier	https://doi.org/10.11588/data/V9CXPR
dc.description	BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages better than alternative subword approaches, while requiring vastly fewer resources and no tokenization.
dc.language	Not applicable
dc.publisher	heiDATA
dc.subject	Computer and Information Science
dc.subject	subword embeddings
dc.subject	byte-pair encoding
dc.subject	multilingual
dc.title	BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)

Dateien zu dieser Ressource

Dateien	Größe	Format	Anzeige
Zu diesem Datensatz gibt es keine Dateien.