TUdatalib : BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)

Am Montag, 7.4.2025 wird TUdatalib wegen geplanten Wartungsarbeiten am Speichersystem von 9:00 bis voraussichtlich 9:30 nur eingeschränkt nutzbar sein (kein Datenupload und Download) | Due to scheduled maintenance on the storage system, using TUdatalib will be limited on Monday, April 7 2025 from 9:00 to approx. 9:30 (no data upload or download)

Personen

Heinzerling, Benjamin

Beschreibung

BPEmb is a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages better than alternative subword approaches, while requiring vastly fewer resources and no tokenization.

Schlagwort

Computer and Information Science;subword embeddings;byte-pair encoding;multilingual

URI

https://doi.org/10.11588/data/V9CXPR

BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)

Personen

Beschreibung

Schlagwort

URI

Sammlungen

AIPHES Heidelberg [5]

BPEmb: Pre-trained Subword Embeddings in 275 Languages (LREC 2018)

Personen

Metadaten

Export

Beschreibung

Schlagwort

URI

Sammlungen

AIPHES Heidelberg [5]