BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)
Abstract
Description
BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE).
In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages better
than alternative subword approaches, while requiring vastly fewer resources and no tokenization.
Citation
Identifier
Endorsement
DFG Classification
Project(s)
Faculty
Collections
License
Except where otherwise noted, this license is described as license.name.undefined