*************README file for**************************************************************** Classification of gravure printed patterns using convolutional neural networks (Python code) ******************************************************************************************** Last modified: 2024-03-15 (yyyy-mm-dd) This dataset was generated by Pauline Rothmann-Brumm (2023) as part of her dissertation at the Technical University of Darmstadt, Germany. Title of the dissertation: Visualisierung, Analyse und Modellierung von fluiddynamischen Musterbildungsphänomenen im Zylinderspalt unter Anwendung von Maschinellem Lernen (German) / Visualization, analysis and modeling of fluid dynamic pattern formation phenomena in the cylinder gap using machine learning (English translation) --------------------- DATASET DESCRIPTION URL: https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3838 This dataset contains Python code ('code_DeepLearn_ImgClass.zip') for automated classification of gravure printed patterns from the HYPA-p dataset. The developed algorithm performs supervised deep learning of convolutional neural networks (CNNs) on labeled data ('CNN_dataset.zip'), i.e. selected, labeled 'S-subfields' from the HYPA-p dataset (see https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3841). 'CNN_dataset.zip' is a subset from the images in the folder 'labeled_data.zip', which can be created with the provided Python code. PyTorch is used as a deep learning framework. The Python code yields trained CNNs, which can be used for automated classification of unlabeled data from the HYPA-p dataset. Well-known, pre-trained network architectures like Densenet-161 or MobileNetV2 are used as a starting point for training. Several trained CNNs are included in this submission, see 'trained_CNN_models.zip'. --------------------- PYTHON CODE Step by step guide: 1. Download the folder 'code_DeepLearn_ImgClass.zip' from the TUdatalib submission 'Classification of gravure printed patterns using convolutional neural networks (Python code)' (https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3838) and unzip it. 'DeepLearn_ImgClass' stands for Deep Learning of Convolutional Neural Networks (CNNs) for Image Classification. 2. Download the data folder 'labeled_data.zip', unzip it and copy it into the folder 'code_DeepLearn_ImgClass'. 3. Set up a suitable Python environment by installing the file 'requirements.yml' from the folder 'code_DeepLearn_ImgClass' using conda package manager: conda env create -f requirements.yml 4. Open the script 'CNN_create_random_dataset.py' and enter the base directory path in line 17. This is the path to the folder 'code_DeepLearn_ImgClass'. Run the script. This will fill the previously empty destination folder 'CNN_dataset' with randomly selected data from the folder 'labeled_data'. The content of the folder 'CNN_dataset' will from now on serve as training, validation and testing data for the training of CNNs. NOTE: Instead of running the script 'CNN_create_random_dataset.py', you can also download the folder 'CNN_dataset.zip' from TUdatalib (https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3838), unzip it and copy it to the folder 'code_DeepLearn_ImgClass'! Thus, you overwrite the empty folder 'CNN_dataset'. 5. Open the script 'CNN_train_validate_hyperparam_tuning.py' and enter the base directory path in line 32 or comment out line 32 and uncomment line 33. Select a CNN-model and enter its name in line 194. Run the script to perform hyperparameter tuning in tensorboard for the selected CNN-model with selected hyperparameter combinations and other settings as defined in lines 38 to 45. Observe the results in tensorboard and select the best hyperparameters (batch size and learning rate). 6. Open the script 'CNN_train_validate.py', enter the base directory path in line 37 and the selected CNN-model in line 48. In lines 53 and 54, fill in the batch size and learning rate that resulted from the hyperparameter tuning from step 5. Run the script to train and validate the selected CNN-model. As one result, you obtain the validation accuracy of the trained CNN-model printed to the terminal. Besides, the complete training history (training loss, validation loss, training accuracy and validation accuracy for each epoch) is saved in the file 'results.cvs' in the folder 'Results' in the base directory. The trained CNN-model is saved in the file 'model.pth'. IMPORTANT: If you train several CNN-models after another, 'model.pth' is overwritten and the training history is appended to the existing 'results.cvs' file. To prevent this, clear the folder 'Results' after each training and save the files in another directory. 7. Open the script 'CNN_test.py', enter the base directory path in line 24. IMPORTANT: The testing data was not used for the training of the CNN-model. Run the script. As one result, you obtain a confusion matrix (saved as 'confmat.png' in the base directory path and printed to the terminal), from which you can compute the test accuracy of the trained CNN-model. Besides, the false predictions are saved in the file 'Falsepredicts.csv' in the base directory path. 8. Open the script 'CNN_inference.py' and fill in the directory path of the trained CNN-model in line 34. Choose 'True' or 'False' for the parameters 'strict' and 'sort_images_in_savedir' in lines 35 and 37. Run the script to use the trained CNN-model for automatic classification of unlabeled images. Classification of images without known class with a trained CNN-model is called inference. First, a file dialog opens, where you have to choose the input folder of the images that you want to classify. Second, another file dialog opens, where you have to choose the folder for saving the results of the classification. As a result, you obtain the file 'classificationlog.csv' in the chosen saving folder, which gives you the predicted class of each input image. If you defined 'sort_images_in_savedir' as 'True', the input images are also copied to the folder 'Classification' (in the chosen saving folder) and within this folder they are sorted into subfolders with the predicted class ('dots', 'fingers' or 'mixed'). --------------------- DOWNLOAD Before downloading the dataset, please take care that your hard drive has enough free space, since some files of the dataset are very large! Eventually change the download path within your browser to an external hard drive. --------------------- CONTACT Pauline Rothmann-Brumm Technical University of Darmstadt, Department of Mechanical Engineering, Institute of Printing Science and Technology (IDD), Magdalenenstr. 2, 64289 Darmstadt, Germany rothmann-brumm@idd.tu-darmstadt.de