Topic-Modeling- and Subject-Classification-Analyses of Articles from the EURASIP Journal on Advances in Signal Processing
| dc.contributor.author | Stille, Wolfgang | |
| dc.contributor.author | Freund, Jens | |
| dc.date.accessioned | 2019-11-26T12:44:50Z | |
| dc.date.available | 2019-10-09T12:51:54Z | |
| dc.date.available | 2019-11-26T12:44:50Z | |
| dc.date.created | 2019-09 | |
| dc.date.issued | 2019-11-26 | |
| dc.description | This data set contains the results of topic-modeling- and subject- classification-analyses of the abstracts of 87 articles from the EURASIP Journal on Advances in Signal Processing (ISSN: 1687-6180). All of the selected articles had in common that they were assigned the keyword “OFDM” (Orthogonal Frequency-Division Multiplexing) by the authors or the publisher. The topic modeling analyses were carried out with the program GibbsLDA++ (<http://gibbslda.sourceforge.net>) once with and once without stemming (model-final.twords_w_stemming.txt and model-final.twords_wo_stemming.txt, respectively). The program parameters were set to: src/lda -est -alpha 0.5 -beta 0.1 -ntopics 10 -niters 1000 -savestep 100 -twords 20 The subject classification analyses were carried out with the web-application Annif.org (<http://annif.org/>), which offers different algorithms for the classification. The following algorithms were used (the name of the corresponding result file is given in brackets): Annif prototype API English (Annif.png), fastText English (fastText.png), Maui English (Maui.png), TF-IDF English (TF-IDF.png), YSO ensemble English (YSO.png). A list with the DOIs of the articles can be found in the file "DOIs_analyzed_articles.txt" and the analyzed abstracts of these articles in the zip archive "Abstracts_EURASIPJAdvSignalProcess.zip". _ We noticed that at least 7 of the 87 abstracts analyzed had been incomplete in the first version of our data set. These abstracts had in common, that they contained numbers that were embedded as inline-graphics within an inline- formula-element. Due to an error in the preprocessing of the texts, the abstracts were cut off behind these elements. For this reason, we repeated the analysis with the complete abstracts. Please notice, that the graphical numbers themselves are still omitted, so that "complete" only refers to the plain text. DOIs of the articles that were affected: * 10.1155/S1110865704403102 * 10.1155/S1110865704401140 * 10.1155/S1110865704311054 * 10.1155/S1110865703309060 * 10.1155/S1110865702000884 * 10.1155/ASP.2005.2730 * 10.1155/ASP.2005.525 _ | en_US |
| dc.identifier.uri | https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/2090.5 | |
| dc.language.iso | en | en_US |
| dc.relation.isbasedon | EURASIP Journal on Advances in Signal Processing, ISSN: 1687-6180 | |
| dc.rights.license | CC-BY-4.0 (https://creativecommons.org/licenses/by/4.0) | |
| dc.subject | TDM | en_US |
| dc.subject | Text and Data Mining | en_US |
| dc.subject | Topic Modeling | en_US |
| dc.subject | Subject Classification | en_US |
| dc.subject | OFDM | en_US |
| dc.subject | Orthogonal Frequency-Division Multiplexing | en_US |
| dc.subject.classification | 1.14-03 | |
| dc.subject.classification | 4.42-02 | |
| dc.subject.ddc | 621.3 | |
| dc.subject.ddc | 400 | |
| dc.subject.ddc | 621.3 | |
| dc.subject.ddc | 400 | |
| dc.title | Topic-Modeling- and Subject-Classification-Analyses of Articles from the EURASIP Journal on Advances in Signal Processing | en_US |
| dc.type | Text | en_US |
| dc.type | Image | en_US |
| dcterms.accessRights | openAccess | |
| person.identifier.orcid | 0000-0003-4468-4208 | |
| person.identifier.orcid | 0000-0001-6232-7568 | |
| tuda.history.classification | Version=2020-2024;104-04 Angewandte Sprachwissenschaften, Experimentelle Linguistik, Computerlinguistik | |
| tuda.history.classification | Version=2020-2024;408-02 Nachrichten- und Hochfrequenztechnik, Kommunikationstechnik und -netze, Theoretische Elektrotechnik |
Files
Original bundle
1 - 9 of 9
| Name | Description | Size | Format | |
|---|---|---|---|---|
| model-final.twords_w_stemming.txt | topic modeling with stemming | 3.81 KB | Plain Text | |
| model-final.twords_wo_stemming.txt | topic modeling without stemming | 4.14 KB | Plain Text | |
| Annif_prototype_API_English.PNG | subject classification Annif prototype API English (Annif.org) | 11.49 KB | Portable Network Graphics | |
| fastText_English.PNG | subject classification fastText English (Annif.org) | 10.21 KB | Portable Network Graphics | |
| Maui_English.PNG | subject classification Maui English (Annif.org) | 8.41 KB | Portable Network Graphics | |
| TF-IDF_English.PNG | subject classification TF-IDF English (Annif.org) | 11.51 KB | Portable Network Graphics | |
| YSO-ensemble_English.PNG | subject classification YSO ensemble English (Annif.org) | 11.27 KB | Portable Network Graphics | |
| DOIs_analyzed_articles.txt | list with the DOIs of all analyzed articles | 2.25 KB | Plain Text | |
| Abstracts_EURASIPJAdvSignalProcess_v2.zip | The analyzed 87 abstracts of the articles from the EURASIP Journal on Advances in Signal Processing. All articles had in common that the authors or the publisher had assigned the keyword "OFDM" (Orthogonal Frequency-Division Multiplexing). | 62.52 KB | ZIP-Archivdateien |
Collections
Version History
You are currently viewing version no. 5 of the item. This is the most recent version.
1 - 2 of 2
| Version | Date | Summary |
|---|---|---|
5* | 2019-11-26 13:11:16 | We repeated the TDM analyses because we had noticed that at least 7 of the 87 abstracts analyzed had been incomplete in the first version of the dataset. A more detailed error description can be found in the description section of the new version. |
| 2019-10-09 14:51:54 |
* Selected version
