Der Login über E-Mail und Passwort wird in Kürze abgeschaltet. Für Externe steht ab sofort der Login über ORCID zur Verfügung.
The login via e-mail and password will be retired in the near future. External uses can login via ORCID from now on.
 

Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions

datacite.relation.isCitedBy https://www.arxiv.org/abs/2501.01872
dc.contributor.author Sachdeva, Rachneet
dc.contributor.author Hazra, Rima
dc.contributor.author Gurevych, Iryna
dc.date.accessioned 2025-07-07T12:33:34Z
dc.date.created 2025-07-04
dc.date.issued 2025-07-07
dc.description Code and data associated with "Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions".
dc.identifier.uri https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/4666
dc.language.iso en
dc.rights.licenseCC-BY-4.0 (https://creativecommons.org/licenses/by/4.0)
dc.subject jailbreak attack, model robustness
dc.subject.classification 4.43-05
dc.subject.ddc 004
dc.title Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions
dc.type Dataset
dc.type Software
dcterms.accessRights openAccess
person.identifier.orcid #PLACEHOLDER_PARENT_METADATA_VALUE#
person.identifier.orcid #PLACEHOLDER_PARENT_METADATA_VALUE#
person.identifier.orcid #PLACEHOLDER_PARENT_METADATA_VALUE#
tuda.agreements true
tuda.unit TUDa

Files

Original bundle

Now showing 1 - 1 of 1
NameDescriptionSizeFormat
arxiv2025-poate-attack.zip2.35 MBZIP-Archivdateien Download

Collections