Results and Conclusions
Gaussian Naive Bayes Model
Training Accuracy ~ 71.45% ; Testing Accuracy ~71.38%
Confusion Matrix
Based on the confusion matrix, the Naive Bayes classifier achieved an overall accuracy of 0.71, which is a reasonable level of performance.
We can see that the model correctly predicted 51,202 non-severe accidents and 47,386 severe accidents, but it incorrectly classified 17,688 non-severe accidents as severe and 21,836 severe accidents as non-severe. This suggests that the model may have difficulty distinguishing between the two classes, particularly when it comes to identifying severe accidents.
Classification Report
Looking at the precision and recall values in the classification report, we can see that the model achieved a precision of 0.70 for non-severe accidents and 0.73 for severe accidents, which means that it correctly identified 70% and 73% of non-severe and severe accidents respectively. The recall values indicate that the model correctly classified 74% of non-severe accidents and 68% of severe accidents.
Conclusions
The Naive Bayes classifier achieved a reasonable level of accuracy in predicting the severity of accidents. The model correctly predicted over 70% of accidents as either non-severe or severe, but it had difficulty distinguishing between the two classes. The false negative and false positive values indicate that the model may benefit from refining its feature selection or adjusting the decision threshold to better classify severe accidents.
The performance metrics calculated above indicate that there is no significant difference between the train and test accuracies, suggesting that the bias-variance tradeoff is not an issue. However, as the accuracy scores for both the train and test datasets are low, it suggests that the model is not effectively capturing the underlying patterns in the data, indicating underfitting. Consequently, further refinement of the model and validation against new datasets may be necessary to ensure its effectiveness in different geographic regions or under different traffic conditions.
Insights and Takeaways
Naive Bayes is a probabilistic algorithm that works by making assumptions about the independence of input features. This makes it a useful tool for identifying patterns and trends that can help predict the severity of a crash. The model achieved a reasonable level of accuracy in predicting the severity of accidents, with an overall accuracy of 0.71. However, the model had difficulty distinguishing between the two classes, particularly when it came to identifying severe accidents. The false negative and false positive values suggest that the model may benefit from refining its feature selection or adjusting the decision threshold to better classify severe accidents.
Creating a Naive Bayes classifier to predict the severity of accidents is a valuable tool for identifying patterns and trends that can help reduce the incidence of severe accidents. It can be an important part of a larger effort to improve road safety and prevent accidents from occurring in the first place. Applying Naive Bayes to predict the severity of a crash can help policymakers and law enforcement agencies identify patterns and trends in traffic data, weather data, and road condition data. These insights can be used to develop targeted strategies to reduce the incidence of severe accidents. It is important to note that the model's performance should be evaluated on a separate dataset to ensure its reliability when applied to new data. While Naive Bayes is a powerful algorithm, it may not be the best choice for more complex problems or problems with highly correlated input features. Nonetheless, Naive Bayes remains a useful tool for predicting the severity of a crash and identifying key factors that contribute to road safety.