*************README file for************************************************************************************ Classification of gravure printed patterns using singular value decomposition and machine learning (MATLAB code) **************************************************************************************************************** Last modified: 2024-09-25 (yyyy-mm-dd) This dataset was generated by Pauline Rothmann-Brumm (2023) as part of her dissertation (https://doi.org/10.26083/tuprints-00026770) at the Technical University of Darmstadt, Germany. Title of the dissertation: Visualisierung, Analyse und Modellierung von fluiddynamischen Musterbildungsphänomenen im Zylinderspalt unter Anwendung von Maschinellem Lernen (German) / Visualization, analysis and modeling of fluid dynamic pattern formation phenomena in the cylinder gap using machine learning (English translation) --------------------- DATASET DESCRIPTION URL: https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3843 This dataset contains MATLAB code ('code_MachLearn_ImgClass.zip') for automated classification of gravure printed patterns from the HYPA-p dataset (https://doi.org/10.48328/tudatalib-1150). The developed algorithm performs singular value decomposition (SVD) and training of several machine learning classifiers, such as k-Nearest Neighbors (kNN). The classifiers are trained and tested on labeled data. Afterwards, the trained classifiers can be used for automated classification of unlabeled data. --------------------- MATLAB CODE - STEP BY STEP GUIDE 1. Note: a) Always download from https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/3843, if there is no specific download link given. b) Before downloading the datasets, please take care that your hard drive has enough free space, since some files of the dataset are very large! Eventually change the download path within your browser to an external hard drive. 2. Download the folder 'code_MachLearn_ImgClass.zip' and unzip it. The content of this zip folder can be seen in 'MATLAB CODE - FILES' below. 'MachLearn_ImgClass' stands for Machine Learning for Image Classification. 3. We need some more functions to run our scripts. Download the library 'matlab2tikz' by Nico Schlömer (https://github.com/matlab2tikz/matlab2tikz) and the library 'boundedline' by Kelly Kearney (https://github.com/kakearney/boundedline-pkg). Move both libraries to the folder '..\code\functions'. Now we are nearly ready to run our scripts in logical order of execution (see 'MATLAB CODE - FILES' below). But for each script, there are some more prerequisites... 4. For 'SVD_on_complete_data.m': a) Dataset: - Option A (recommended): Download the data matrices 'dots_all.mat', 'mixed_all.mat' and 'fingers_all.mat'. Make sure your drive has enough free space! Move the data matrices to the folder '..\data\image_data\labeled\processed'. - Option B: Download the data folder 'labeled_data.zip' from https://doi.org/10.48328/tudatalib-1147 and unzip it. Make sure your drive has enough free space! The unzipped folder contains the subfolders 'dots', 'mixed' and 'fingers'. Move these three subfolders into the folder '..\data\image_data\labeled\raw'. From this raw data you can later create the processed data 'dots_all.mat', 'mixed_all.mat' and 'fingers_all.mat' using the function 'create_batches.m', which is called in the scripts 'SVD_on_complete_data.m' and 'classifier_training_loops.m'. b) Singular value decomposition (SVD): - Option A (recommended): Download the folder 'SVD_data.zip'. Make sure your drive has enough free space! The folder contains the results of the singular value decomposition (SVD), i.e. the matrices 'S.mat', 'S_hat.mat', 'U.mat', 'U_hat.mat', 'V.mat', and 'V_hat.mat'. Unzip the folder and move all the matrices to the folder '..\data\SVD_data'. Use this option if you do not want to compute the SVD by yourself (needs a PC with strong CPU and a lot of RAM!). - Option B: If you want to compute the SVD by yourself, you do not need to download any files. The files 'S.mat', 'S_hat.mat', 'U.mat', 'U_hat.mat', 'V.mat', and 'V_hat.mat' are automatically created within the script 'SVD_on_complete_data.m' by performing SVD on the dataset. Now you can run 'SVD_on_complete_data.m'. It performs SVD on the complete dataset of 26880 labeled, grayscale images of printed patterns. The image labels are 'dots' (class 1), 'mixed' (class 2) and 'fingers' (class 3). The dataset is rearranged into a data matrix before performing the SVD. The SVD is used for a dimensionality reduction of the dataset. As a result of the SVD, the singular values as well as the first 64 SVD-modes are plotted. 5. For 'classifier_training_loops.m': Dataset: Same as in step 4a. Now you can run 'classifier_training_loops.m'. It performs training and testing of several machine learning models for the classification of printed patterns within several training loops. It uses 26880 labeled grayscale images of patterns, which are rearranged into a data matrix. The labels are 'dots' (class 1), 'mixed' (class 2) and 'fingers' (class 3). The image data is divided into training and test set. A randomized singular value decomposition (rSVD) is performed for dimensionality reduction of the image data. Optionally, a fast Fourier transform (FFT) is applied to the image data before performing the rSVD. Afterwards, several machine learning models are trained and tested within several training loops, e.g. k-nearest neighbor (kNN), classification tree, naive Bayes etc. The script systematically investigates the influence of several factors on the classification accuracy, e.g. the application of FFT or dataset balancing. Other factors are investigated as well. 6. For 'kNN_training.m': Dataset: Same as in step 4a. Now you can run 'kNN_training.m'. It performs training and testing of a k-nearest-neighbor (kNN) machine learning model for the classification of printed patterns. The script uses 26880 labeled grayscale images of patterns, which are rearranged into a data matrix. The labels are 'dots' (class 1), 'mixed' (class 2) and 'fingers' (class 3). First, a singular value decomposition (SVD) is performed for dimensionality reduction of the image data. Optionally, a fast Fourier transform (FFT) is applied to the image data before performing the SVD. Afterwards, the kNN-model is trained and tested. 7. For 'predict_classes.m': Dataset: - Option A (recommended): Download 'S-subfields_B3-01.mat' and/or 'S-subfields_B3-05.mat'. Make sure your drive has enough free space! Move to '..\data\image_data\unlabeled\processed\'. - Option B: Download the folders 'S-subfields_B3-01.zip' and/or 'S-subfields_B3-05.zip' from the HYPA-p dataset (https://doi.org/10.48328/tudatalib-1150). Make sure your drive has enough free space! Extract to '..\data\image_data\unlabeled\raw\'. Delete the files 'InputList_S-subfields_B3-01.mat' and/or 'InputList_S-subfields_B3-05.mat' from the folder '..\data\other'. The script 'predict_classes.m' will automatically create the files 'S-subfields_B3-01.mat' and/or 'S-subfields_B3-05.mat' as well as 'InputList_S-subfields_B3-01.mat' and/or 'InputList_S-subfields_B3-05.mat' from the raw data. Model: The k-nearest neighbor (kNN) model, which is used in the script 'predict_classes.m', was created within the script 'kNN_training.m' and saved to the folder '..\data\trained_model'. An exemplary kNN model is provided there, in case you do not want to run 'kNN_training.m'. Now you can run 'predict_classes.m'. It performs classification of unlabeled data using a kNN model, which was trained on labeled images of printed patterns in the script 'kNN_training.m'. The classification results are used for the creation of regime maps. These maps correlate printing process parameters and pattern class and thus provide insights into the dynamics of fluid splitting in gravure printing. --------------------- MATLAB CODE - FILES Content of the folder 'code_MachLearn_ImgClass.zip': Scripts (in logical order of execution): SVD_on_complete_data.m classifier_training_loops.m kNN_training.m predict_classes.m Functions: import_functions.m (imports the other functions) check_files.m create_batches.m create_unknown_batches.m display_distribution.m display_distribution_split.m display_elapsedTime.m plot_errors_allTargetRanks.m plot_errors_oneTargetRank.m plot_recall_oneTargetRank.m plot_regime_maps_mod.m plotAndSave_rSVDmodes.m plotAndSave_singularValues.m plotAndSave_SVDmodes.m rsvd.m (by Brunton and Kutz (2022). Data-driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press.) save_var.m In the folder '..\data\other' you additionally find: BlueColormapOpt.mat (colormap for plotting of SVD modes) InputList.mat (list of all labeled images) InputList_S-subfields_B3-01.mat (list of unlabeled images from experiment B3-01) InputList_S-subfields_B3-05.mat (list of unlabeled images from experiment B3-05) In the folder '..\data\trained_model' you additionally find: kNN_model.mat (example trained kNN model) rSVD_modes.mat (example rSVD modes) In the folder '..\documentation\results\classifier_results' you additionally find: model_1.mat (trained kNN models from dissertation of Pauline Rothmann-Brumm) model_2.mat model_3.mat ... model_7.mat Model_parameters.csv (overview of model parameters) --------------------- ACKNOWLEDGMENT I kindly acknowledge the financial support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 265191195 – Collaborative Research Center 1194 (CRC 1194) ‘Interaction between Transport and Wetting Processes,’ project C01. I want to thank Isabel Scherl and Steven L. Brunton for their input on reduced-order modeling and singular value decomposition. I also thank Nathanael Feutner for supporting the code development. --------------------- CONTACT Pauline Rothmann-Brumm Technical University of Darmstadt, Department of Mechanical Engineering, Institute of Printing Science and Technology (IDD), Magdalenenstr. 2, 64289 Darmstadt, Germany rothmann-brumm@idd.tu-darmstadt.de