Der Login über E-Mail und Passwort wird in Kürze abgeschaltet. Für Externe steht ab sofort der Login über ORCID zur Verfügung.
The login via e-mail and password will be retired in the near future. External uses can login via ORCID from now on.
 

Constrained C-Test Generation via Mixed-Integer Programming (Supplementary Material)

dc.contributor.author Lee, Ji-Ung
dc.contributor.author Pfetsch, Marc
dc.contributor.author Gurevych, Iryna
dc.date.accessioned 2024-04-08T09:54:18Z
dc.date.available 2024-04-08T09:54:18Z
dc.date.created 2024-04
dc.date.issued 2024-04-08
dc.description This work proposes a novel method to generate C-Tests; a deviated form of cloze tests (a gap filling exercise) where only the last part of a word is turned into a gap. In contrast to previous works that only consider varying the gap size or gap placement to achieve locally optimal solutions, we propose a mixed-integer programming (MIP) approach. This allows us to consider gap size and placement simultaneously, achieving globally optimal solutions and to directly integrate state-of-the-art models for gap difficulty prediction into the optimization problem. A user study with 40 participants across four C-Tests generation strategies (including GPT-4) shows that our approach (*MIP*) significantly outperforms two of the baseline strategies (based on gap placement and GPT-4); and performs on-par with the third (based on gap size). Our analysis shows that GPT-4 still struggles to fulfill explicit constraints during generation and that *MIP* produces C-Tests that correlate best with the perceived difficulty. We publish our code, model, and collected data consisting of 32 English C-Tests with 20 gaps each (3,200 in total) under an open source license. de_DE
dc.identifier.uri https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/4205
dc.language.iso en de_DE
dc.rights.licenseCC-BY-4.0 (https://creativecommons.org/licenses/by/4.0)
dc.subject C-Test de_DE
dc.subject NLP de_DE
dc.subject Language Learning de_DE
dc.subject Constrained Optimization de_DE
dc.subject Machine Learning de_DE
dc.subject.classification 4.43-04
dc.subject.classification 4.43-05
dc.subject.ddc 004
dc.title Constrained C-Test Generation via Mixed-Integer Programming (Supplementary Material) de_DE
dc.type Dataset de_DE
dc.type Text de_DE
dc.type Software de_DE
dcterms.accessRights openAccess
person.identifier.orcid #PLACEHOLDER_PARENT_METADATA_VALUE#
person.identifier.orcid 0000-0002-0947-7193
person.identifier.orcid 0000-0003-2187-7621
tuda.history.classification Version=2016-2020;409-05 Interaktive und intelligente Systeme, Bild- und Sprachverarbeitung, Computergraphik und Visualisierung
tuda.unit TUDa

Files

Original bundle

Now showing 1 - 11 of 11
NameDescriptionSizeFormat
GPT-4.zip28.67 KBZIP-Archivdateien Download
User Study Data.zip46.35 KBZIP-Archivdateien Download
Variability Data.zip431.41 KBZIP-Archivdateien Download
sentence_scoring.jar64.61 MBUnknown data format Download
feature_extraction.jar188.17 MBUnknown data format Download
XGB Model.zip141.58 KBZIP-Archivdateien Download
transformer_models_CLS.zipFine-tuned CLS prediction transformer models4.93 GBZIP-Archivdateien Download
transformer_models_MR.zipFine-tuned transformer models for masked regression4.91 GBZIP-Archivdateien Download
transformer_models_CLS-F.zipFine-tuned CLS+Feature prediction transformer models4.92 GBZIP-Archivdateien Download
MLP.zipBest performing MLP models (2-layers) with linear and ReLU activation.161.71 KBZIP-Archivdateien Download
SVM.zipBest performing SVM (c=0.01) model for gap difficulty prediction1.13 KBZIP-Archivdateien Download

Collections