document.write( "Question 1178478: This question requires a confusion matrix. I am unsure how to calculate the accuracy given the percentages. How do I go about this? Thanks. :) \r
\n" );
document.write( "\n" );
document.write( "Both model A and model B achieves 80% accuracy on the balanced population, but changes to 64% and 96% accuracy respectively on the true unbalanced population. If the population is more skewed with 5% positive and 95% negative, what are the accuracies for each model?(The training population is 1,000 and the confusion matrix for model A is [500 200]
\n" );
document.write( " [0 300]
\n" );
document.write( "The confusion matrix for model B is [300 0]
\n" );
document.write( " [200 500]).\r
\n" );
document.write( "
\n" );
document.write( "\n" );
document.write( " \n" );
document.write( "
Algebra.Com's Answer #850338 by CPhill(1959)![]() ![]() You can put this solution on YOUR website! Absolutely! Let's break down how to calculate accuracy with percentages and then apply it to your confusion matrices and the skewed population scenario.\r \n" ); document.write( "\n" ); document.write( "**Understanding Accuracy and Confusion Matrices**\r \n" ); document.write( "\n" ); document.write( "* **Accuracy:** The overall correctness of a model. It's calculated as (True Positives + True Negatives) / Total Predictions. \n" ); document.write( "* **Confusion Matrix:** A table that visualizes the performance of a classification model. It typically looks like this:\r \n" ); document.write( "\n" ); document.write( " ``` \n" ); document.write( " | | Predicted Positive | Predicted Negative | \n" ); document.write( " |----------------|--------------------|--------------------| \n" ); document.write( " | Actual Positive | True Positive (TP) | False Negative (FN) | \n" ); document.write( " | Actual Negative | False Positive (FP) | True Negative (TN) | \n" ); document.write( " ```\r \n" ); document.write( "\n" ); document.write( "**Calculating Accuracy from a Confusion Matrix**\r \n" ); document.write( "\n" ); document.write( "1. **Sum the Diagonals:** Add the True Positives (TP) and True Negatives (TN). \n" ); document.write( "2. **Sum All Cells:** Add all the values in the matrix (TP + FP + FN + TN). This is the total number of predictions. \n" ); document.write( "3. **Divide:** Divide the sum of the diagonals by the sum of all cells.\r \n" ); document.write( "\n" ); document.write( "**Applying It to Your Models**\r \n" ); document.write( "\n" ); document.write( "* **Model A:** \n" ); document.write( " * Confusion Matrix: \n" ); document.write( " ``` \n" ); document.write( " [500 200] \n" ); document.write( " [0 300] \n" ); document.write( " ``` \n" ); document.write( " * TP = 500, TN = 300, FP = 200, FN = 0 \n" ); document.write( " * Accuracy = (500 + 300) / (500 + 200 + 0 + 300) = 800 / 1000 = 0.8 or 80%\r \n" ); document.write( "\n" ); document.write( "* **Model B:** \n" ); document.write( " * Confusion Matrix: \n" ); document.write( " ``` \n" ); document.write( " [300 0] \n" ); document.write( " [200 500] \n" ); document.write( " ``` \n" ); document.write( " * TP = 300, TN = 500, FP = 0, FN = 200 \n" ); document.write( " * Accuracy = (300 + 500) / (300 + 0 + 200 + 500) = 800 / 1000 = 0.8 or 80%\r \n" ); document.write( "\n" ); document.write( "**The Unbalanced Population (64% and 96%)**\r \n" ); document.write( "\n" ); document.write( "You've mentioned that the accuracies change to 64% and 96% on the \"true unbalanced population.\" This means the original confusion matrices, while based on a balanced dataset, are being evaluated against a new dataset that is unbalanced.\r \n" ); document.write( "\n" ); document.write( "**The Highly Skewed Population (5% Positive, 95% Negative)**\r \n" ); document.write( "\n" ); document.write( "Now, let's calculate the accuracies for the even more skewed population.\r \n" ); document.write( "\n" ); document.write( "1. **Assume a Total Population:** Since you have a 1,000 training population, for simplicity, let's assume we are still working with a population of 1000. \n" ); document.write( "2. **Calculate Positive and Negative Counts:** \n" ); document.write( " * Positive: 1000 * 0.05 = 50 \n" ); document.write( " * Negative: 1000 * 0.95 = 950 \n" ); document.write( "3. **Adjust the Confusion Matrices Based on the Proportions:** \n" ); document.write( " * Since the original confusion matrix totals are 1000, we will keep the ratios of the True positives, false positives, false negatives and true negatives. \n" ); document.write( " * Model A: \n" ); document.write( " * TP: 500/700 * 50 = 35.71, round to 36. \n" ); document.write( " * FP: 200/500 * 950 = 380. \n" ); document.write( " * FN: 0. \n" ); document.write( " * TN: 300/500 * 950 = 570. \n" ); document.write( " * Model B: \n" ); document.write( " * TP: 300/500 * 50 = 30. \n" ); document.write( " * FP: 0 \n" ); document.write( " * FN: 200/500 * 50 = 20. \n" ); document.write( " * TN: 500/500 * 950 = 950. \n" ); document.write( "4. **Calculate Accuracies:** \n" ); document.write( " * Model A: \n" ); document.write( " * Accuracy = (36 + 570) / 1000 = 606 / 1000 = 0.606 or 60.6% \n" ); document.write( " * Model B: \n" ); document.write( " * Accuracy = (30 + 950) / 1000 = 980 / 1000 = 0.98 or 98%\r \n" ); document.write( "\n" ); document.write( "**Therefore:**\r \n" ); document.write( "\n" ); document.write( "* For the 5% positive, 95% negative population: \n" ); document.write( " * Model A's accuracy is approximately 60.6%. \n" ); document.write( " * Model B's accuracy is 98%.\r \n" ); document.write( "\n" ); document.write( "**Key Takeaway:**\r \n" ); document.write( "\n" ); document.write( "* When dealing with unbalanced datasets, accuracy can be misleading. Model B appears to perform much better in the skewed population. \n" ); document.write( "* It is critical to be aware of the underlying population distribution when evaluating the model. \n" ); document.write( " \n" ); document.write( " |