BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)
dc.creator | Heinzerling, Benjamin | |
dc.date | 2019-02-06 | |
dc.identifier | https://doi.org/10.11588/data/V9CXPR | |
dc.description | BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages better than alternative subword approaches, while requiring vastly fewer resources and no tokenization. | |
dc.language | Not applicable | |
dc.publisher | heiDATA | |
dc.subject | Computer and Information Science | |
dc.subject | subword embeddings | |
dc.subject | byte-pair encoding | |
dc.subject | multilingual | |
dc.title | BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018) |
Dateien zu dieser Ressource
Dateien | Größe | Format | Anzeige |
---|---|---|---|
Zu diesem Datensatz gibt es keine Dateien. |