Table of contents | |
What is Evaluation? | |
Evaluation Terminologies | |
Confusion Matrix | |
Precision | |
F1 Score | |
Which Metric is Important? |
Evaluation is a process that critically examines a program. It involves collecting and analyzing information about a program’s activities, characteristics, and outcomes. Its purpose is to make judgments about a program, to improve its effectiveness, and/or to inform programming decisions.
Let me explain this to you:
Evaluation is basically to check the performance of your AI model. This is done by mainly two things: “Prediction” & “Reality“. Evaluation is done by:
You can do this to:
Prediction and Reality
There are various terminologies that come in when we work on evaluating our model. Let’s explore them with an example of the Football scenario
By analyzing these combinations, we can evaluate the performance and efficiency of the AI model. The goal is to maximize the number of True Positives and True Negatives while minimizing the number of False Positives and False Negatives.
1. Possibility
2. Case
3. Possible action
4. Last case
The comparison between the results of Prediction and Reality is known as the Confusion Matrix.
The confusion matrix helps us interpret the prediction results. It is not an evaluation metric itself but serves as a record to aid in evaluation. Let’s review the four conditions related to the football example once more.
Now let us go through all the possible combinations of “Prediction” and “Reality” & let us see how we can use these conditions to evaluate the model.
Definition: Accuracy is the percentage of “correct predictions out of all observations.” A prediction is considered correct if it aligns with the reality.
In this context, there are two scenarios where the Prediction matches the Reality:
Accuracy Formula
Here, total observations cover all the possible cases of prediction that can be True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).
Example: Let’s revisit the Football example.
Assume the model always predicts that there is no football. In reality, there is a 2% chance of encountering a football. In this scenario, the model will be correct 98% of the time when it predicts no football. However, it will be incorrect in the 2% of cases where a football is actually present, as it incorrectly predicts no football.
Here:
Definition: The percentage of "true positive cases" out of all cases where the prediction is positive. This metric considers both True Positives and False Positives. It measures how well the model identifies positive cases among all cases it predicts as positive.
In other words, it evaluates the proportion of correctly identified positive instances compared to all instances the model predicted as positive.
Definition: Precision is the percentage of “true positive cases” out of all cases where the prediction is positive. It considers both True Positives and False Positives.
In the football example, if the model always predicts the presence of a football, regardless of reality, all positive predictions are evaluated, including:
Just like the story of the boy who falsely cried out about wolves and was ignored when real wolves arrived, if the precision is low (indicating more false positives), it could lead to complacency. Players might start ignoring the predictions, thinking they're mostly false, and thus fail to check for the ball when it’s actually there.
Example:
Definition: Recall, also known as Sensitivity or True Positive Rate, is the fraction of actual positive cases that are correctly identified by the model.
In the football example, recall focuses on the true cases where a football was actually present, examining how well the model detected it. It takes into account:
Recall Formula
In both Precision and Recall, the numerator is the same: True Positives. However, the denominators differ: Precision includes False Positives, while Recall includes False Negatives.
Definition: The F1 Score measures the balance between precision and recall. It is used when there is no clear preference for one metric over the other, providing a way to seek a balance between them.
Choosing between Precision and Recall depends on the specific context and the costs associated with False Positives and False Negatives:
Cases of High FN Cost:
Cases of High FP Cost:
To sum up, if you want to assess your model’s performance comprehensively, both Precision and Recall are crucial metrics.
Both Precision and Recall range from 0 to 1, and so does the F1 Score, with 1 representing the perfect performance.
Let us explore the variations we can have in the F1 Score:
40 videos|35 docs|6 tests
|
1. What is the purpose of a confusion matrix in evaluation? |
2. How is the accuracy of a model calculated using a confusion matrix? |
3. What is precision and recall in the context of evaluation terminologies? |
4. How can one interpret the F1 score when evaluating a model's performance? |
5. Why is it important to consider both false positives and false negatives in evaluation metrics? |
|
Explore Courses for Class 10 exam
|