## Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions
#### Authors: Qian Ruan, Ilia Kuznetsov, Iryna Gurevych
#### UKP Lab, Technical University of Darmstadt, Germany
#### Contact: ruan@ukp.tu-darmstadt.de

### Data Structure
> Re3-Sci2.0
   > data_auto: collection of LLM annotations
     > edits_s.csv: sentence-level edits, with the corresponding edit action and intent labels. The columns are:
       -- 'doc_name': the unqiue name of the document,
       -- 'node_ix_src': the unique id for the source node, i.e., the new sentence
       -- 'node_ix_tgt': the unique id for the target node, i.e., the old sentence
       -- 'text_src': the content of the new sentence, empty in cases of deletions,
       -- 'text_tgt': the content of the old sentence, empty in cases of additions,
       -- 'ea': the edit action label,
       -- 'ei', the edit intent label.

   > docs: document versions in ITG format. The corresponding annotations can be found in the CSV files within the 'data_auto' directory, identified by the 'doc_name' attribute.
     NOTE: the ARR documents from NLPeer contain only two versions (v1.json and v2.json), the FRD documents from F1000RD could contain more than two versions. For simplicity, we annotated the first and the last versions of the FRD documents, but maintained all versions in the document folders.
     > subfolder structure:
       > v1.json: the original version of the document in ITG format
       > v2.json: the second version of the document in ITG format
       > v<X>.json: the Xth version of the document in ITG format
       > review: the review document(s) in ITG format, if available
   > meta:
     > doc_cat.csv: the document categories. The columns are:
       -- 'doc_name': the unqiue name of the document,
       -- 'cat': the document category label, which is one of ['nlp', 'case', 'med', 'tool', 'nat', 'soc'].
       -- 'doc_len_s': the number of sentences in the original version of document.

### Cite
[1]. Qian Ruan, Ilia Kuznetsov, and Iryna Gurevych. 2024. Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions. ArXiv, cs.CL/2410.02028. In Proceedings the 2024 Conference on Empirical Methods in Natural Language Processing (Main Long Paper)
@article{ruan2024-llm-classifiers,
      title={Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions},
      author={Qian Ruan and Ilia Kuznetsov and Iryna Gurevych},
      year={2024},
      journal={arXiv preprint arXiv:2410.02028},
      url={https://arxiv.org/abs/2410.02028},}