Data Preparation
Data in Scope:
US Traffic - Fatalities and Fatality Rates by State
This dataset shows the traffic fatalities and the fatality rates based on population, licensed drivers, and registered vehicles by the state for the year 2016 in the United States. Click here to view the dataset.
This dataset is a comprehensive source of information on traffic accidents and fatalities in different regions of the United States. The dataset includes several key attributes, including the number of licensed drivers, registered vehicles, and population in thousands for each state, as well as the number of fatalities per 100,000 drivers, registered vehicles, and population. Additionally, the dataset includes information on the total number of individuals killed in traffic accidents. By examining this data, one can try to identify regions with higher rates of accidents and fatalities, as well as look to determine which factors contribute most to road safety. This dataset is an important resource for understanding the state of road safety in the United States and identifying areas for improvement and can be leveraged to develop targeted initiatives, policies, or drives to improve road safety and reduce the number of fatalities across the country.
In the context of data cleaning for clustering analysis of the US drivers and fatalities per state dataset, it can be said that the data was already in a fairly clean state. This meant that there was no need to put in a lot of effort to prepare the data for the clustering algorithms. The data types for the attributes in question were accurate, which made it easier to filter the necessary attributes for the analysis. The main task involved in cleaning the data was to remove any labels or columns that were not relevant to the analysis and scale the data. This helped to streamline the data and make it easier to work with, enabling the clustering algorithms to perform optimally on the remaining data. Overall, the cleaning process was relatively straightforward and mainly involved selecting the relevant data for analysis while excluding irrelevant information.
Snippet of the data before cleaning
Snippet of the data post cleaning