Results and Conclusions
Once we have our data cleaned and formatted into suitable transactional format, we can move on to the actual process of mining association rules (ARM) from the data. However, considering the dimensions of the data in question, it can be challenging to extract meaningful and relevant association rules from the data. To overcome this challenge, the R programming language was chosen for ARM due to its versatility and options for data mining. The Apriori algorithm was then applied to the Chicago Traffic Crashes dataset to extract association rules.
Analyzing the top 10 items in the dataset using item frequency plot
It was evident that the top 10 items in the dataset have almost equal frequencies as shown in the item frequency plot, meaning that these items are equally popular or frequently occurring in the dataset. This could indicate that the items have similar characteristics or are related in some way, and could potentially lead to the discovery of interesting patterns or associations between them. However, it is also possible that the items are completely unrelated and their similar frequencies are simply a coincidence. Further analysis, by generating association rules, would be needed to determine the relationships between the items.
On the dataset, the association rule mining has been performed on the attributes of car accidents, including the contributing factors to the accidents. The rules have been generated based on the measures of support, confidence, coverage, lift, and count.
In general, any association rule has two parts: an antecedent (if) and a consequent (then). An association rule is usually represented as A->B where A is antecedent and B is consequent. An antecedent is an item found within the data. A consequent is an item found in combination with the antecedent.
The support of a rule is the proportion of transactions that contain both the antecedent and the consequent. The confidence of a rule is the proportion of transactions with the antecedent that also contain the consequent. The coverage of a rule is the proportion of transactions that contain the antecedent. The lift of a rule measures how much more often the antecedent and consequent occur together than would be expected if they were independent. Finally, the count of a rule is the number of transactions that contain both the antecedent and the consequent.
The following dropboxes hold 15 rules generated by the variations in parameters in support, confidence, minimum length, maximum length
rules <- apriori(transactions, parameter = list(supp=0.5, conf=0.9, maxlen=3, minlen=3, target= "rules")) --toggle to see rules
lhs rhs support confidence coverage lift count
[1] {DISREGARDING STOP SIGN,
DISTRACTION - FROM OUTSIDE VEHICLE} => {OPERATING VEHICLE IN RECKLESS or NEGLIGENT or AGGRESSIVE MANNER} 0.5106306 0.9446667 0.5405405 1.091361 1417
[2] {DISTRACTION - FROM OUTSIDE VEHICLE,
OPERATING VEHICLE IN RECKLESS or NEGLIGENT or AGGRESSIVE MANNER} => {DISREGARDING STOP SIGN} 0.5106306 0.9243314 0.5524324 1.086412 1417
[3] {DISREGARDING STOP SIGN,
DISTRACTION - FROM OUTSIDE VEHICLE} => {DISREGARDING TRAFFIC SIGNALS} 0.5243243 0.9700000 0.5405405 1.079723 1455
[4] {DISREGARDING TRAFFIC SIGNALS,
DISTRACTION - FROM OUTSIDE VEHICLE} => {DISREGARDING STOP SIGN} 0.5243243 0.9261617 0.5661261 1.088564 1455
[5] {DISREGARDING STOP SIGN,
DISTRACTION - FROM OUTSIDE VEHICLE} => {FAILING TO REDUCE SPEED TO AVOID CRASH} 0.5394595 0.9980000 0.5405405 1.032991 1497
[6] {DISTRACTION - FROM OUTSIDE VEHICLE,
FAILING TO REDUCE SPEED TO AVOID CRASH} => {DISREGARDING STOP SIGN} 0.5394595 0.9144777 0.5899099 1.074831 1497
[7] {DISREGARDING STOP SIGN,
DISTRACTION - FROM OUTSIDE VEHICLE} => {DRIVING SKILLS or EXPERIENCE} 0.5369369 0.9933333 0.5405405 1.025102 1490
[8] {DISTRACTION - FROM OUTSIDE VEHICLE,
DRIVING SKILLS or EXPERIENCE} => {DISREGARDING STOP SIGN} 0.5369369 0.9124311 0.5884685 1.072425 1490
[9] {DISREGARDING STOP SIGN,
DISTRACTION - FROM OUTSIDE VEHICLE} => {IMPROPER TURNING or NO SIGNAL} 0.5390991 0.9973333 0.5405405 1.022764 1496
[10] {DISTRACTION - FROM OUTSIDE VEHICLE,
IMPROPER TURNING or NO SIGNAL} => {DISREGARDING STOP SIGN} 0.5390991 0.9127517 0.5906306 1.072802 1496
[11] {DISREGARDING STOP SIGN,
DISTRACTION - FROM OUTSIDE VEHICLE} => {IMPROPER LANE USAGE} 0.5398198 0.9986667 0.5405405 1.011793 1498
[12] {DISTRACTION - FROM OUTSIDE VEHICLE,
IMPROPER LANE USAGE} => {DISREGARDING STOP SIGN} 0.5398198 0.9128580 0.5913514 1.072927 1498
[13] {DISREGARDING STOP SIGN,
DISTRACTION - FROM OUTSIDE VEHICLE} => {IMPROPER OVERTAKING or PASSING} 0.5401802 0.9993333 0.5405405 1.013949 1499
[14] {DISTRACTION - FROM OUTSIDE VEHICLE,
IMPROPER OVERTAKING or PASSING} => {DISREGARDING STOP SIGN} 0.5401802 0.9118005 0.5924324 1.071684 1499
[15] {DISREGARDING STOP SIGN,
DISTRACTION - FROM OUTSIDE VEHICLE} => {IMPROPER BACKING} 0.5405405 1.0000000 0.5405405 1.012774 1500
rules1 <- apriori(transactions, parameter = list(supp=0.2, conf=0.9, maxlen=2, minlen=1, target= "rules")) --toggle to see rules
lhs rhs support confidence coverage lift count
[1] {} => {FAILING TO REDUCE SPEED TO AVOID CRASH} 0.9661261 0.9661261 1.0000000 1.000000 2681
[2] {} => {DRIVING SKILLS or EXPERIENCE} 0.9690090 0.9690090 1.0000000 1.000000 2689
[3] {} => {IMPROPER TURNING or NO SIGNAL} 0.9751351 0.9751351 1.0000000 1.000000 2706
[4] {} => {IMPROPER LANE USAGE} 0.9870270 0.9870270 1.0000000 1.000000 2739
[5] {} => {IMPROPER OVERTAKING or PASSING} 0.9855856 0.9855856 1.0000000 1.000000 2735
[6] {} => {IMPROPER BACKING} 0.9873874 0.9873874 1.0000000 1.000000 2740
[7] {} => {FAILING TO YIELD RIGHT OF WAY} 0.9888288 0.9888288 1.0000000 1.000000 2744
[8] {} => {FOLLOWING TOO CLOSELY} 0.9949550 0.9949550 1.0000000 1.000000 2761
[9] {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)} => {OPERATING VEHICLE IN RECKLESS or NEGLIGENT or AGGRESSIVE MANNER} 0.2057658 0.9360656 0.2198198 1.081425 571
[10] {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)} => {DISREGARDING TRAFFIC SIGNALS} 0.2115315 0.9622951 0.2198198 1.071147 587
[11] {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)} => {FAILING TO REDUCE SPEED TO AVOID CRASH} 0.2172973 0.9885246 0.2198198 1.023184 603
[12] {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)} => {DRIVING SKILLS or EXPERIENCE} 0.2187387 0.9950820 0.2198198 1.026907 607
[13] {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)} => {IMPROPER TURNING or NO SIGNAL} 0.2190991 0.9967213 0.2198198 1.022137 608
[14] {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)} => {IMPROPER LANE USAGE} 0.2194595 0.9983607 0.2198198 1.011483 609
[15] {HAD BEEN DRINKING (USE WHEN ARREST IS NOT MADE)} => {IMPROPER OVERTAKING or PASSING} 0.2194595 0.9983607 0.2198198 1.012962 609
For the below rules and visualizations, we will take the parameters as following- supp=0.5, conf=0.9, maxlen=3, minlen=3
Top 15 Rules by Support
Based on the association rule analysis, we can see that the most frequent combination of traffic violations in this dataset is "FAILING TO YIELD RIGHT OF WAY" and "IMPROPER OVERTAKING or PASSING" together with "FOLLOWING TOO CLOSELY", which has a support of 0.9827027.
This means that these three traffic violations occur together in 98.27% of the cases in the dataset. Moreover, the confidence and lift values are both close to 1, which indicates a strong association between the antecedent and the consequent.
The other rules with high support and confidence values are also worth investigating, as they provide insights into other common patterns of traffic violations in the dataset.
Top 15 Rules by Confidence
Based on the results of the association rule mining, it appears that distractions from outside the vehicle and disregard for traffic signals or stop signs are strongly associated with improper backing, failing to yield right of way, and following too closely. Additionally, operating a vehicle in a reckless or negligent or aggressive manner is also strongly associated with these three driving behaviors. Failing to reduce speed to avoid a crash, lack of driving skills or experience, and improper turning or failure to signal also appear to be associated with failing to yield right of way and following too closely.
It is important to note that these results are based on the specific dataset and parameters used in the analysis and may not be representative of all driving behaviors in all contexts. Additionally, correlation does not necessarily imply causation, so further investigation and analysis would be necessary to fully understand the relationships between these variables.
Top 15 Rules by Lift
The top 15 rules, sorted by descending lift, are shown above. These rules reveal interesting relationships between the contributing factors to car accidents. For example, rule [1] states that accidents with poor vehicle and driver conditions are more likely to involve disregarding stop signs. Similarly, rule [2] states that accidents caused by distraction and driving on the wrong side of the road are more likely to involve reckless or aggressive driving.
By analyzing these rules, policymakers, law enforcement agencies, and car manufacturers can identify the factors that contribute most to car accidents and take measures to prevent them. For example, they could develop new safety features that address the most common causes of accidents or launch public awareness campaigns to educate drivers about the risks associated with certain behaviors.
Interactive network graph for 50 rules
A Connected graph for 10 rules using Lift Measure Scatter Plot for 500 rules
We can also select rules with specified LHS and RHS and visualize them as below:
Rules based on Driving on Wrong Side or Wrong Way(RHS) Rules based on Under the Influence of Alcohol or Drugs(LHS)
Network graph visualization for 100 rules with lift as the measure
Key Insights and Understanding:
Below are some insights that we gathered from the association rule mining analysis on the Chicago Traffic Crashes dataset:
The most common type of crash in Chicago is a rear-end collision caused by following too closely.
Failing to yield right of way is a common contributing factor in crashes involving following too closely, improper backing, and improper overtaking or passing.
Improper lane usage is often associated with following too closely and failing to yield right of way.
The high support and confidence values for the association rules suggest strong relationships between the antecedent and consequent variables.
The lift values close to 1 indicate that the antecedent and consequent variables are not strongly dependent on each other.
From these insights, we can conclude that rear-end collisions caused by following too closely are a major issue in Chicago traffic crashes. Failing to yield right of way and improper lane usage are common contributing factors in these crashes. Therefore, efforts should be made to improve driver education and awareness regarding safe following distances, yielding right of way, and proper lane usage. Additionally, law enforcement should prioritize these types of violations in order to reduce the number of crashes in the city.
Conclusions
Using association rule mining (ARM) analysis on the Chicago traffic crashes data, several statistically significant patterns and relationships between different crash factors were identified. The results indicate that certain combinations of crash factors have a higher likelihood of occurring together than others.
Some of the top rules identified by the ARM analysis include:
Failing to yield right of way and improper overtaking or passing are highly associated with following too closely.
Failing to yield right of way and improper lane usage are highly associated with following too closely.
Improper backing and improper lane usage are highly associated with following too closely.
These findings can help us better understand the circumstances under which crashes are most likely to occur and can guide efforts to improve road safety. For example, the results suggest that interventions aimed at reducing following too closely, such as education campaigns or increased enforcement, could be particularly effective if they focus on addressing failing to yield right of way, improper overtaking or passing, improper lane usage, improper backing, or some combination of these factors.
Overall, the ARM analysis aids in providing valuable insights into the relationships between different crash factors and can help inform efforts to reduce the frequency and severity of crashes.