Constrained C-Test Generation via Mixed-Integer Programming (Supplementary Material)
| dc.contributor.author | Lee, Ji-Ung | |
| dc.contributor.author | Pfetsch, Marc | |
| dc.contributor.author | Gurevych, Iryna | |
| dc.date.accessioned | 2024-04-08T09:54:18Z | |
| dc.date.available | 2024-04-08T09:54:18Z | |
| dc.date.created | 2024-04 | |
| dc.date.issued | 2024-04-08 | |
| dc.description | This work proposes a novel method to generate C-Tests; a deviated form of cloze tests (a gap filling exercise) where only the last part of a word is turned into a gap. In contrast to previous works that only consider varying the gap size or gap placement to achieve locally optimal solutions, we propose a mixed-integer programming (MIP) approach. This allows us to consider gap size and placement simultaneously, achieving globally optimal solutions and to directly integrate state-of-the-art models for gap difficulty prediction into the optimization problem. A user study with 40 participants across four C-Tests generation strategies (including GPT-4) shows that our approach (*MIP*) significantly outperforms two of the baseline strategies (based on gap placement and GPT-4); and performs on-par with the third (based on gap size). Our analysis shows that GPT-4 still struggles to fulfill explicit constraints during generation and that *MIP* produces C-Tests that correlate best with the perceived difficulty. We publish our code, model, and collected data consisting of 32 English C-Tests with 20 gaps each (3,200 in total) under an open source license. | de_DE |
| dc.identifier.uri | https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/4205 | |
| dc.language.iso | en | de_DE |
| dc.rights.license | CC-BY-4.0 (https://creativecommons.org/licenses/by/4.0) | |
| dc.subject | C-Test | de_DE |
| dc.subject | NLP | de_DE |
| dc.subject | Language Learning | de_DE |
| dc.subject | Constrained Optimization | de_DE |
| dc.subject | Machine Learning | de_DE |
| dc.subject.classification | 4.43-04 | |
| dc.subject.classification | 4.43-05 | |
| dc.subject.ddc | 004 | |
| dc.title | Constrained C-Test Generation via Mixed-Integer Programming (Supplementary Material) | de_DE |
| dc.type | Dataset | de_DE |
| dc.type | Text | de_DE |
| dc.type | Software | de_DE |
| dcterms.accessRights | openAccess | |
| person.identifier.orcid | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| person.identifier.orcid | 0000-0002-0947-7193 | |
| person.identifier.orcid | 0000-0003-2187-7621 | |
| tuda.history.classification | Version=2016-2020;409-05 Interaktive und intelligente Systeme, Bild- und Sprachverarbeitung, Computergraphik und Visualisierung | |
| tuda.unit | TUDa |
Files
Original bundle
1 - 11 of 11
| Name | Description | Size | Format | |
|---|---|---|---|---|
| GPT-4.zip | 28.67 KB | ZIP-Archivdateien | ||
| User Study Data.zip | 46.35 KB | ZIP-Archivdateien | ||
| Variability Data.zip | 431.41 KB | ZIP-Archivdateien | ||
| sentence_scoring.jar | 64.61 MB | Unknown data format | ||
| feature_extraction.jar | 188.17 MB | Unknown data format | ||
| XGB Model.zip | 141.58 KB | ZIP-Archivdateien | ||
| transformer_models_CLS.zip | Fine-tuned CLS prediction transformer models | 4.93 GB | ZIP-Archivdateien | |
| transformer_models_MR.zip | Fine-tuned transformer models for masked regression | 4.91 GB | ZIP-Archivdateien | |
| transformer_models_CLS-F.zip | Fine-tuned CLS+Feature prediction transformer models | 4.92 GB | ZIP-Archivdateien | |
| MLP.zip | Best performing MLP models (2-layers) with linear and ReLU activation. | 161.71 KB | ZIP-Archivdateien | |
| SVM.zip | Best performing SVM (c=0.01) model for gap difficulty prediction | 1.13 KB | ZIP-Archivdateien |
