Perform Machine Learning Classification


What is Weka

Named after a flightless New Zealand bird, Weka is a set of machine learning algorithms that can be applied to a data set directly, or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.



Classification

In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. An example would be assigning a given email into "spam" or "non-spam" classes or assigning a diagnosis to a given patient as described by observed characteristics of the patient (gender, blood pressure, presence or absence of certain symptoms, etc.). Classification is an example of pattern recognition [wikipedia].

Steps to perform classification using Weka

 In this example we have two data sets, i.e. 'training_data' , which is used for training the model and 'test_data', which is used for actual classification of the flood traffic. Following are the steps that can be followed in order to perform classification of any sort of training and testing data.

 1. Save both data sets into csv files to be opened in Weka
2. Open both data sets into Weka and classify using zero-R and click on use training set and run the ZeroR algorithm
3. Click Start to run algorithm and in the status window and click on visualize classifier errors and save the data.arff and predict.arff file
4. Open training_set.arff and training_set.csv in notepad++ and copy the dataset from csv file and paste into data.arff file
5. Delete @predict attribute from data.arff file
6. Open test_set.csv and test_set.arff file into notepad++ and copy the dataset from csv file into arff file
7. Open training_set.arff file and test_set.arff file in notepad++ and copy attributes from training_set.arff file and paste into test_set.arff file under @relation
8. Put question mark at the end of each data set relation in test_set.arff file
9. Open training_set.arff in Weka file and click on classify and choose multilayer regression and choose cross validation 10 fold and click on start now click on save model and save it with the name model
10. Open test_set.arff file and click on classify and choose multilayer regression and load model and choose the model saved
11. Click on supplied test set and choose predict.arff file
12. Click on revaluate model using supplied test set, now open the test_set.arff file in Weka

Comments

Popular posts from this blog

Guidelines for Effective Academic Writing

Unstructued Notes on TCP IP Networking

Protecting IT Infrastructure: Key Takeaways from the CrowdStrike Update Incident