What is called classification? Classification is a method of data mining that assigns objects to target categories or classes in a set.
What is basis of classification?
The classification goal is to predict the target class correctly in the data for each event. For example, to classify loan applicants as small, medium, or high credit risks, a classification model could be used.
A classification process starts with a data set that recognizes the class assignments. For example, for many loan applicants over a period of time, a classification model that predicts credit risk could be developed based on observed data.
The data may track job history, homeownership or rental, years of residence, number and type of investment, and so on in addition to the historical credit rating. Credit rating would be the target, predictors would be the other attributes, and a case would be the data for each customer.
What are the benefits of classification?
Classifications are unobtrusive and do not indicate order. Continuous, floating-point values will imply a target that is numerical and not categorical. A computational aim predictive model uses a regression algorithm, not an algorithm for classification. Binary classification is the simplest form of the classification problem.
The target attribute has only two possible values in binary classification: high credit rating, for example, or low credit rating. Multiclass goals have more than two values: low, medium, high, or unknown credit ratings, for example.
A classification algorithm finds relationships between the predictor values and the target values in the model construct (training) process. In order to find associations, different classification algorithms use different techniques. Such relationships are outlined in a model that can then be extended to another set of data where class assignments are uncertain.
Classification models are evaluated using a series of test data to equate the expected values with known target values.
Usually, the historical data for a project of classification is split into two data sets: one for model construction; the other for model research. Scoring a model of classification results in allocations of class and probabilities for each scenario. For example, the likelihood of each classification for each customer would also be predicted by a model that classifies customers as low, medium, or high value.
In consumer segmentation, business modeling, marketing, credit analysis, and biomedical and drug response modeling, classification has many applications.
Testing a Classification Model
A classification model is used to assess data with known target values and to compare the expected values with the known values.
The test data must be consistent with the data used to create the model and must be prepared in the same manner as the design data was prepared. Construction data and test data typically come from the same collection of historical data.
The model is used to create a percentage of the data; the remaining records are used to test the model.
To measure how well the model predicts the known values, test metrics are used. If the model works well and satisfies the business needs, then new data can be used to predict the future.
Accuracy refers to the percentage of the model’s correct predictions compared to the actual classifications in the test data.
SUMMARY OF MODULE 04:
- The classification goal is to predict the target class correctly in the data for each event.
- A classification process starts with a data set that recognizes the class assignments.
- Binary classification is the simplest form of the classification problem.
- Usually, the historical data for a project of classification is split into two data sets: one for model construction; the other for model research.