Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions
| datacite.relation.isCitedBy | https://www.arxiv.org/abs/2501.01872 | |
| dc.contributor.author | Sachdeva, Rachneet | |
| dc.contributor.author | Hazra, Rima | |
| dc.contributor.author | Gurevych, Iryna | |
| dc.date.accessioned | 2025-07-07T12:33:34Z | |
| dc.date.created | 2025-07-04 | |
| dc.date.issued | 2025-07-07 | |
| dc.description | Code and data associated with "Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions". | |
| dc.identifier.uri | https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/4666 | |
| dc.language.iso | en | |
| dc.rights.license | CC-BY-4.0 (https://creativecommons.org/licenses/by/4.0) | |
| dc.subject | jailbreak attack, model robustness | |
| dc.subject.classification | 4.43-05 | |
| dc.subject.ddc | 004 | |
| dc.title | Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions | |
| dc.type | Dataset | |
| dc.type | Software | |
| dcterms.accessRights | openAccess | |
| person.identifier.orcid | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| person.identifier.orcid | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| person.identifier.orcid | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
| tuda.agreements | true | |
| tuda.unit | TUDa |
Files
Original bundle
1 - 1 of 1
| Name | Description | Size | Format | |
|---|---|---|---|---|
| arxiv2025-poate-attack.zip | 2.35 MB | ZIP-Archivdateien |
