TUdatalib : Constrained C-Test Generation via Mixed-Integer Programming (Supplementary Material)

Am Montag, 7.4.2025 wird TUdatalib wegen geplanten Wartungsarbeiten am Speichersystem von 9:00 bis voraussichtlich 9:30 nur eingeschränkt nutzbar sein (kein Datenupload und Download) | Due to scheduled maintenance on the storage system, using TUdatalib will be limited on Monday, April 7 2025 from 9:00 to approx. 9:30 (no data upload or download)

Count of file(s): 11

GPT-4.zip (28.66KB)

User Study Data.zip (46.34KB)

Variability Data.zip (431.4KB)

sentence_scoring.jar (64.60MB)

feature_extraction.jar (188.1MB)

Date

2024-04

Author

Lee, Ji-Ung

Pfetsch, Marc

Gurevych, Iryna

Type

Dataset
Text
Software

Description

This work proposes a novel method to generate C-Tests; a deviated form of cloze tests (a gap filling exercise) where only the last part of a word is turned into a gap. In contrast to previous works that only consider varying the gap size or gap placement to achieve locally optimal solutions, we propose a mixed-integer programming (MIP) approach. This allows us to consider gap size and placement simultaneously, achieving globally optimal solutions and to directly integrate state-of-the-art models for gap difficulty prediction into the optimization problem. A user study with 40 participants across four C-Tests generation strategies (including GPT-4) shows that our approach (*MIP*) significantly outperforms two of the baseline strategies (based on gap placement and GPT-4); and performs on-par with the third (based on gap size). Our analysis shows that GPT-4 still struggles to fulfill explicit constraints during generation and that *MIP* produces C-Tests that correlate best with the perceived difficulty. We publish our code, model, and collected data consisting of 32 English C-Tests with 20 gaps each (3,200 in total) under an open source license.

Collections

C-Tests [3]

The following license files are associated with this item:

License description

Except where otherwise noted, this item's license is described as Creative Commons Attribution 4.0

Constrained C-Test Generation via Mixed-Integer Programming (Supplementary Material)

Count of file(s): 11

Date

Author

Type

Metadata

Export

Description

Subject

DFG subject classification

URI

Collections

C-Tests [3]