Results and Conclusions

Once we have our data cleaned and formatted into suitable transactional format, we can move on to the actual process of mining association rules (ARM) from the data. However, considering the dimensions of the data in question, it can be challenging to extract meaningful and relevant association rules from the data. To overcome this challenge, the R programming language was chosen for ARM due to its versatility and options for data mining. The Apriori algorithm was then applied to the Chicago Traffic Crashes dataset to extract association rules.

Analyzing the top 10 items in the dataset using item frequency plot

It was evident that the top 10 items in the dataset have almost equal frequencies as shown in the item frequency plot, meaning that these items are equally popular or frequently occurring in the dataset. This could indicate that the items have similar characteristics or are related in some way, and could potentially lead to the discovery of interesting patterns or associations between them. However, it is also possible that the items are completely unrelated and their similar frequencies are simply a coincidence. Further analysis, by generating association rules, would be needed to determine the relationships between the items.

On the dataset, the association rule mining has been performed on the attributes of car accidents, including the contributing factors to the accidents. The rules have been generated based on the measures of support, confidence, coverage, lift, and count.

In general, any association rule has two parts: an antecedent (if) and a consequent (then).  An association rule is usually represented as A->B where A is antecedent and B is consequent. An antecedent is an item found within the data. A consequent is an item found in combination with the antecedent.

The support of a rule is the proportion of transactions that contain both the antecedent and the consequent. The confidence of a rule is the proportion of transactions with the antecedent that also contain the consequent. The coverage of a rule is the proportion of transactions that contain the antecedent. The lift of a rule measures how much more often the antecedent and consequent occur together than would be expected if they were independent. Finally, the count of a rule is the number of transactions that contain both the antecedent and the consequent.

The following dropboxes hold 15 rules generated by the variations in parameters in support, confidence, minimum length, maximum length

rules <- apriori(transactions, parameter = list(supp=0.5,  conf=0.9,  maxlen=3,  minlen=3, target= "rules")) --toggle to see rules

lhs                                                                  rhs                                                                 support confidence  coverage     lift count

[1]   {DISREGARDING STOP SIGN,                                                                                                                                                            

       DISTRACTION - FROM OUTSIDE VEHICLE}                              => {OPERATING VEHICLE IN RECKLESS or NEGLIGENT or AGGRESSIVE MANNER} 0.5106306  0.9446667 0.5405405 1.091361  1417

[2]   {DISTRACTION - FROM OUTSIDE VEHICLE,                                                                                                                                                

       OPERATING VEHICLE IN RECKLESS or NEGLIGENT or AGGRESSIVE MANNER} => {DISREGARDING STOP SIGN}                                          0.5106306  0.9243314 0.5524324 1.086412  1417

[3]   {DISREGARDING STOP SIGN,                                                                                                                                                            

       DISTRACTION - FROM OUTSIDE VEHICLE}                              => {DISREGARDING TRAFFIC SIGNALS}                                    0.5243243  0.9700000 0.5405405 1.079723  1455

[4]   {DISREGARDING TRAFFIC SIGNALS,                                                                                                                                                      

       DISTRACTION - FROM OUTSIDE VEHICLE}                              => {DISREGARDING STOP SIGN}                                          0.5243243  0.9261617 0.5661261 1.088564  1455

[5]   {DISREGARDING STOP SIGN,                                                                                                                                                            

       DISTRACTION - FROM OUTSIDE VEHICLE}                              => {FAILING TO REDUCE SPEED TO AVOID CRASH}                          0.5394595  0.9980000 0.5405405 1.032991  1497

[6]   {DISTRACTION - FROM OUTSIDE VEHICLE,                                                                                                                                                

       FAILING TO REDUCE SPEED TO AVOID CRASH}                          => {DISREGARDING STOP SIGN}                                          0.5394595  0.9144777 0.5899099 1.074831  1497

[7]   {DISREGARDING STOP SIGN,                                                                                                                                                            

       DISTRACTION - FROM OUTSIDE VEHICLE}                              => {DRIVING SKILLS or EXPERIENCE}                                    0.5369369  0.9933333 0.5405405 1.025102  1490

[8]   {DISTRACTION - FROM OUTSIDE VEHICLE,                                                                                                                                                

       DRIVING SKILLS or EXPERIENCE}                                    => {DISREGARDING STOP SIGN}                                          0.5369369  0.9124311 0.5884685 1.072425  1490

[9]   {DISREGARDING STOP SIGN,                                                                                                                                                            

       DISTRACTION - FROM OUTSIDE VEHICLE}                              => {IMPROPER TURNING or NO SIGNAL}                                   0.5390991  0.9973333 0.5405405 1.022764  1496

[10]  {DISTRACTION - FROM OUTSIDE VEHICLE,                                                                                                                                                

       IMPROPER TURNING or NO SIGNAL}                                   => {DISREGARDING STOP SIGN}                                          0.5390991  0.9127517 0.5906306 1.072802  1496

[11]  {DISREGARDING STOP SIGN,                                                                                                                                                            

       DISTRACTION - FROM OUTSIDE VEHICLE}                              => {IMPROPER LANE USAGE}                                             0.5398198  0.9986667 0.5405405 1.011793  1498

[12]  {DISTRACTION - FROM OUTSIDE VEHICLE,                                                                                                                                                

       IMPROPER LANE USAGE}                                             => {DISREGARDING STOP SIGN}                                          0.5398198  0.9128580 0.5913514 1.072927  1498

[13]  {DISREGARDING STOP SIGN,                                                                                                                                                            

       DISTRACTION - FROM OUTSIDE VEHICLE}                              => {IMPROPER OVERTAKING or PASSING}                                  0.5401802  0.9993333 0.5405405 1.013949  1499

[14]  {DISTRACTION - FROM OUTSIDE VEHICLE,                                                                                                                                                

       IMPROPER OVERTAKING or PASSING}                                  => {DISREGARDING STOP SIGN}                                          0.5401802  0.9118005 0.5924324 1.071684  1499

[15]  {DISREGARDING STOP SIGN,                                                                                                                                                            

       DISTRACTION - FROM OUTSIDE VEHICLE}                              => {IMPROPER BACKING}                                                0.5405405  1.0000000 0.5405405 1.012774  1500


rules1 <- apriori(transactions, parameter = list(supp=0.2,  conf=0.9,  maxlen=2,  minlen=1, target= "rules")) --toggle to see rules

     lhs                                                        rhs                                                               support   confidence coverage  lift     count

[1]   {}                                                      => {FAILING TO REDUCE SPEED TO AVOID CRASH}                          0.9661261 0.9661261  1.0000000 1.000000 2681 

[2]   {}                                                      => {DRIVING SKILLS or EXPERIENCE}                                    0.9690090 0.9690090  1.0000000 1.000000 2689 

[3]   {}                                                      => {IMPROPER TURNING or NO SIGNAL}                                   0.9751351 0.9751351  1.0000000 1.000000 2706 

[4]   {}                                                      => {IMPROPER LANE USAGE}                                             0.9870270 0.9870270  1.0000000 1.000000 2739 

[5]   {}                                                      => {IMPROPER OVERTAKING or PASSING}                                  0.9855856 0.9855856  1.0000000 1.000000 2735 

[6]   {}                                                      => {IMPROPER BACKING}                                                0.9873874 0.9873874  1.0000000 1.000000 2740 

[7]   {}                                                      => {FAILING TO YIELD RIGHT OF WAY}                                   0.9888288 0.9888288  1.0000000 1.000000 2744 

[8]   {}                                                      => {FOLLOWING TOO CLOSELY}                                           0.9949550 0.9949550  1.0000000 1.000000 2761 

[9]   {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)}       => {OPERATING VEHICLE IN RECKLESS or NEGLIGENT or AGGRESSIVE MANNER} 0.2057658 0.9360656  0.2198198 1.081425  571 

[10]  {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)}       => {DISREGARDING TRAFFIC SIGNALS}                                    0.2115315 0.9622951  0.2198198 1.071147  587 

[11]  {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)}       => {FAILING TO REDUCE SPEED TO AVOID CRASH}                          0.2172973 0.9885246  0.2198198 1.023184  603 

[12]  {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)}       => {DRIVING SKILLS or EXPERIENCE}                                    0.2187387 0.9950820  0.2198198 1.026907  607 

[13]  {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)}       => {IMPROPER TURNING or NO SIGNAL}                                   0.2190991 0.9967213  0.2198198 1.022137  608 

[14]  {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)}       => {IMPROPER LANE USAGE}                                             0.2194595 0.9983607  0.2198198 1.011483  609 

[15]  {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)}       => {IMPROPER OVERTAKING or PASSING}                                  0.2194595 0.9983607  0.2198198 1.012962  609 

For the below rules and visualizations, we will take the parameters as following- supp=0.5,  conf=0.9,  maxlen=3,  minlen=3

Top 15 Rules by Support

Based on the association rule analysis, we can see that the most frequent combination of traffic violations in this dataset is "FAILING TO YIELD RIGHT OF WAY" and "IMPROPER OVERTAKING or PASSING" together with "FOLLOWING TOO CLOSELY", which has a support of 0.9827027.

This means that these three traffic violations occur together in 98.27% of the cases in the dataset. Moreover, the confidence and lift values are both close to 1, which indicates a strong association between the antecedent and the consequent.

The other rules with high support and confidence values are also worth investigating, as they provide insights into other common patterns of traffic violations in the dataset.

Top 15 Rules by Confidence

Based on the results of the association rule mining, it appears that distractions from outside the vehicle and disregard for traffic signals or stop signs are strongly associated with improper backing, failing to yield right of way, and following too closely. Additionally, operating a vehicle in a reckless or negligent or aggressive manner is also strongly associated with these three driving behaviors. Failing to reduce speed to avoid a crash, lack of driving skills or experience, and improper turning or failure to signal also appear to be associated with failing to yield right of way and following too closely.

It is important to note that these results are based on the specific dataset and parameters used in the analysis and may not be representative of all driving behaviors in all contexts. Additionally, correlation does not necessarily imply causation, so further investigation and analysis would be necessary to fully understand the relationships between these variables.

Top 15 Rules by Lift

The top 15 rules, sorted by descending lift, are shown above. These rules reveal interesting relationships between the contributing factors to car accidents. For example, rule [1] states that accidents with poor vehicle and driver conditions are more likely to involve disregarding stop signs. Similarly, rule [2] states that accidents caused by distraction and driving on the wrong side of the road are more likely to involve reckless or aggressive driving.

By analyzing these rules, policymakers, law enforcement agencies, and car manufacturers can identify the factors that contribute most to car accidents and take measures to prevent them. For example, they could develop new safety features that address the most common causes of accidents or launch public awareness campaigns to educate drivers about the risks associated with certain behaviors.

Interactive network graph for 50 rules

          A Connected graph for 10 rules using Lift Measure   Scatter Plot for 500 rules

We can also select rules with specified LHS and RHS and visualize them as below:

  Rules based on Driving on Wrong Side or Wrong Way(RHS)       Rules based on Under the Influence of Alcohol or Drugs(LHS)

Network graph visualization for 100 rules with lift as the measure

Key Insights and Understanding:

Below are some insights that we gathered from the association rule mining analysis on the Chicago Traffic Crashes dataset:

From these insights, we can conclude that rear-end collisions caused by following too closely are a major issue in Chicago traffic crashes. Failing to yield right of way and improper lane usage are common contributing factors in these crashes. Therefore, efforts should be made to improve driver education and awareness regarding safe following distances, yielding right of way, and proper lane usage. Additionally, law enforcement should prioritize these types of violations in order to reduce the number of crashes in the city.


Conclusions

Using association rule mining (ARM) analysis on the Chicago traffic crashes data, several statistically significant patterns and relationships between different crash factors were identified. The results indicate that certain combinations of crash factors have a higher likelihood of occurring together than others.

Some of the top rules identified by the ARM analysis include:

These findings can help us better understand the circumstances under which crashes are most likely to occur and can guide efforts to improve road safety. For example, the results suggest that interventions aimed at reducing following too closely, such as education campaigns or increased enforcement, could be particularly effective if they focus on addressing failing to yield right of way, improper overtaking or passing, improper lane usage, improper backing, or some combination of these factors.

Overall, the ARM analysis aids in providing valuable insights into the relationships between different crash factors and can help inform efforts to reduce the frequency and severity of crashes.


Association Rule Mining Code