The last stage of AI Project cycle is evaluation. When the model or project is ready after problem scoping, data acquisition, data exploration and modelling, the final stage is to just evaluate the model or project whether it is ready or not for the action.
Evaluation helps to check whether the model is better than the other one or not. It is the most important part of the development process. It helps to determine the best model for data processing. It also helps in how well the model will work in future.
As we know we have two kinds of datasets:
Consider this scenario where you have an AI prediction model which predicts the possibilities of fires in the forest. The main aim of this model is to predict whether a forest fire has broken out into the forest or not. To understand whether the model is working properly or not we need to predict to check if the predictions made by the model is correct or not.
So there are two conditions:
The output given by the machine after training and testing the data is known as a prediction.
The reality is the real situation or real scenario in the forest where the prediction has been made by a machine.
There are four cases and all have their own terms:
Condition 1 – Prediction – Yes, Reality – Yes (True Positive)
This condition arises when the prediction and reality both match with yes (prediction – yes, reality – yes), if forest fire has broken out.
Condition 2 – Prediction – No, Reality – No (True Negative)
If there is no fire in the forest and prediction predicted by machine correctly as No as well as reality also no, this condition is known as True Negative.
Condition 3 – Prediction – Yes, Reality – No (False Positive)
There is no fire in reality but the machine has predicted yes incorrectly. This condition is known as False Positive.
Condition 4 – Prediction – No, Reality – Yes (False Negative)
The forest fire has broken out in reality but the machine has incorrectly predicted No refers to False Negative condition.
It is a comparison between prediction and reality. It helps us to understand the prediction result. It is not an evaluation metric but a record that can help in evaluation. Observe the four conditions explained above.
Prediction and reality can be mapped together with the help of a confusion matrix. Look at the following:
These evaluation methods are as follows:
1. Accuracy
The total observations cover all the possible cases of prediction that can be True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).
Accuracy talks about how true the predictions are by any model. Now let’s return to the forest fire example.
If the model always predicts that there is no fire where in reality there is a 2% chance of forest fire breaking out. So 98% of the model is right but for these 2% also model predicts there is no fire. Hence the elements of the formula are as follows:
Therefore, accuracy = (98 + 0) / 100 = 98%.
This returns high accuracy for an AI model. But the actual cases where the fire broke out are not taken into account. Therefore there is a need to look at another parameter that takes account of such cases as well.
Now return to the forest fire example, Assume that model predictions of forest fire is irrespective of the reality. In this case, all the positive conditions will be taken into the account i.e. True Positive (TP) and False Positive (FP). So in this case firefighters always see if the alarm was True or False.
Recall the story of the boy who falsely cries out and complains about the wolf every time, but when the wolf came in reality, no one rescued him. Similarly if Precision is low (more false alarms) then the firefighters would get complacent and might not go and check every time considering it a false alarm.
So, if Precision is high, means that True Positive cases are more, given lesser False alarms.
In the recall method, the fraction of positive cases that are correctly identified will be taken into consideration. It majorly takes into account the true reality cases wherein Reality there was a fire but the machine either detected it correctly or it didn’t. That is, it considers True Positives and False Negatives.
As we have observed that the numerator in both precision and recall is same i.e. True Positive. Where in the denominator, precision counts the False Positive while recall considers False Negatives. In the following case False negative cases can be very costly.
Sometimes False Positive also costs us more than False Negative. Just have a look at the following cases:
You can think of more examples of these cases. Observe this table:
So if you want to know the performance of your model is good, you need these two measures:
In some cases, high precision may be there but low recall and for some cases, low precision may be there but high recall. Hence both measures are very important. So there is a need for a parameter that takes both Precision and Recall into account.
F1 score is the balance between precision and recall.
In conclusion, we can say that a model has good performance if the F1 Score for that model is high.
40 videos|35 docs|6 tests
|
|
Explore Courses for Class 10 exam
|