Results and Conclusions

Choosing the Best Parameters

GridSearchCV is a powerful technique used for hyperparameter tuning, which is an essential step in machine learning model development. The process involves selecting the optimal set of hyperparameters that can improve the model's performance. GridSearchCV performs an exhaustive search over a specified grid of hyperparameters and evaluates the performance of the model using cross-validation. The best hyperparameters are selected based on the performance metric specified, such as accuracy or F1 score. 

In this case, GridSearchCV has been used to determine the best arguments for an SVM Classifier. The results are as below-

The hyperparameter C controls the regularization strength in SVM, with smaller values indicating stronger regularization, and the linear kernel specifies the type of decision boundary to be used for classification. 

It's worth noting that the optimal hyperparameters found by GridSearchCV may depend on the specified grid of parameters and the problem being tackled, and may not necessarily generalize to other datasets. Therefore, it's important to perform hyperparameter tuning for each specific problem to obtain the best performance.

The top 10 models as per the gridseachcv results are as below

Best performing SVM model: Kernel = linear; C = 0.2

Confusion Matrix

The confusion matrix of a model is a matrix used to evaluate the performance of the classification model. It summarizes the actual and predicted class labels for a set of data, and shows the number of correct and incorrect predictions made by the model. 

In the context of binary classification, the confusion matrix is typically represented as a 2x2 matrix having four entries where each row represents the actual class labels and each column represents the predicted class labels.:

True Positive (TP): The model correctly predicted the positive class. 

False Positive (FP): The model predicted the positive class, but the true class is negative. 

False Negative (FN): The model predicted the negative class, but the true class is positive.

True Negative (TN): The model correctly predicted the negative class.


Interpreting the confusion matrix

The rows of the confusion matrix of the model in this case represent the actual values of the target variable (Severity of an accident), and the columns represent the predicted values by the model. The confusion matrix shows the number of true positives, true negatives, false positives, and false negatives. 

In this case, the SVM model correctly predicted 1049 cases as "Not Severe" and 948 cases as "Severe" (i.e., true positives), while incorrectly predicting 227 cases as "Not Severe" but they were actually "Severe" (i.e., false negatives), and 276 cases as "Severe" but they were actually "Not Severe" (i.e., false positives). 

While the model seems to perform reasonably well, there is room for improvement in reducing the number of false positives and false negatives to increase the accuracy of the model.


Classification Report

The classification report of a model provides precision, recall, F1 score, and support for each class, as well as the accuracy, macro average, and weighted average across all classes. 

The classification report in this case for the SVM model with the given hyperparameters reveals several key insights. Firstly, the precision for predicting "Not Severe" accidents is 0.79, indicating that 79% of the predicted "Not Severe" accidents were correctly predicted. Similarly, the precision for predicting "Severe" accidents is 0.81, signifying that 81% of the predicted "Severe" accidents were correctly predicted. Moreover, the recall (sensitivity or true positive rate) for "Not Severe" accidents is 0.82, suggesting that 82% of the actual "Not Severe" accidents were correctly predicted. On the other hand, the recall for "Severe" accidents is 0.77, meaning that 77% of the actual "Severe" accidents were correctly predicted.

Furthermore, the F1-score, which is the harmonic mean of precision and recall, is 0.81 for "Not Severe" accidents and 0.79 for "Severe" accidents. A higher F1-score indicates better model performance. Additionally, the overall accuracy of the model is 0.80, which implies that the model correctly predicted the severity of accidents in 80% of the cases. The macro average and weighted average of precision, recall, and F1-score are also around 0.80, suggesting consistent performance across both classes. 

In conclusion, the SVM model with the given hyperparameters shows promising performance in terms of precision, recall, F1-score, and accuracy for predicting the severity of accidents based on the US accidents dataset. However, it may be beneficial to conduct further evaluation and fine-tuning of hyperparameters to potentially improve the model's performance, depending on specific requirements and goals.


Tweaking the hyperparameter for the Linear Kernel:

SVM model 2

Kernel = Linear ; C = 1

Train Accuracy ~ 79.6% ; Test Accuracy ~79.9%

SVM model 3

Kernel = Linear ; C = 3

Train Accuracy ~ 79.3% ; Test Accuracy ~79.6%

Changing the Kernel to Polynomial

SVM model 4

Kernel = Polynomial; Degree = 4; C = 3

Train Accuracy ~ 59.3% ; Test Accuracy ~58.8%

SVM model 5

Kernel = Polynomial; Degree = 4; C = 5

Train Accuracy ~ 62.7% ; Test Accuracy ~62.2%

SVM model 6

Kernel = Polynomial; Degree = 4; C = 10

Train Accuracy ~ 70.6% ; Test Accuracy ~70.2%

Changing the Kernel to Radial Basis Function (RBF)

SVM model 7

Kernel = RBF; C = 2

Train Accuracy ~ 53% ; Test Accuracy ~54%

SVM model 8

Kernel = RBF; C = 7

Train Accuracy ~ 58% ; Test Accuracy ~57%

SVM model 9

Kernel = RBF; C = 5.5

Train Accuracy ~ 57.5% ; Test Accuracy ~57%

Visualizing the decision boundary for the SVM classifier

It is beneficial to visualize the boundaries in a graphical form to better understand how they are established. In order to plot the boundaries on a 2D graph, Principal Component Analysis (PCA) was used to reduce the number of columns to two. PCA is a dimensionality reduction technique that projects high-dimensional data onto a lower-dimensional space while preserving important information such as variance. 

Reducing the data to 2D using PCA allows for the creation of a scatter plot where data points are plotted based on their two most important principal components. The decision boundary or margin of the SVM model can then be plotted on the same graph, providing a visual representation of how the model separates data points of different classes. Visualizing the boundaries on a 2D graph can aid in understanding the complexity of relationships between features and classes, as well as identifying potential model issues such as overfitting or underfitting. Evaluating the boundaries visually can also help assess model performance - well-defined boundaries that accurately separate data points of different classes may indicate good model performance, while ambiguous or overlapping boundaries may suggest the need for further model tuning or adjustments. 

The SVM classifiers with different kernels after using PCA for dimensionality reduction are visualized below.

Performing PCA on a dataset for SVM classification can lead to a loss of information about the original features, resulting in potential decrease in accuracy. Additionally, retaining too many principal components during PCA can cause overfitting, leading to poor generalization performance. Careful consideration of the trade-off between dimensionality reduction and information loss is crucial for optimizing the performance of SVM classifiers with PCA.

Insights and Takeaways

Based on the classification reports of the above SVM models with different kernels (linear, polynomial, and RBF), following observations can be made:

Nevertheless, SVM models' effectiveness in forecasting the seriousness of accidents can be considerably impacted by the choice of the kernel function. In this case, the linear kernel seems to be the most effective, followed by the polynomial kernel, while the RBF kernel performs significantly worse. To maximize the precision and prediction capability of the SVM model for this particular dataset, additional testing with various kernels and hyperparameter tuning may be required.

Conclusions

Using an SVM classifier to predict the severity of accidents can be a promising approach for traffic incident analysis. The SVM model leverages the power of machine learning to learn patterns from labeled data and make predictions on unseen data. The choice of kernel function, such as linear, polynomial, or RBF, plays a crucial role in the performance of the SVM model. Based on the results obtained from the three SVM models with different kernels, it can be observed that the linear kernel yielded the best performance. The polynomial kernel showed moderate performance, while the RBF kernel exhibited lower predictive power. These findings highlight the importance of selecting an appropriate kernel function in SVM models to enhance their performance for a specific dataset. 

However, it is also important to note that the accuracy and predictive power of the SVM model may depend on various factors, such as the quality and size of the dataset, feature engineering, hyperparameter tuning, and the specific characteristics of the traffic incidents and fatalities being analyzed. Further analysis and experimentation may be necessary to fine-tune the SVM model and assess its generalizability and robustness in real-world applications. Nevertheless, the use of SVM classifier for predicting accident severity based on the US accidents dataset holds the potential for providing valuable insights for traffic safety planning, resource allocation, and accident prevention strategies.

Source Code