Data Preparation

Data in Scope: 

This is a publicly available dataset that is a collection of images extracted from real-time CCTV footage with the aim of detecting accidents in real-time. The dataset is split into three folders, train, test, and validation, each containing images of accidents and non-accidents. The images were taken from various CCTV cameras and show a diverse range of accidents, including car crashes, pedestrian accidents, and bicycle accidents.

The dataset used for this study was pre-processed and optimized for direct use in the modeling process. This means that the data is already in a format that is compatible with the neural network architecture and does not require further cleaning or preparation. Moreover, the dataset is conveniently partitioned into three distinct subsets - training, testing, and validation. This allows for a systematic approach to training and testing the model, ensuring that it is robust and capable of generalizing to new and unseen data. With this dataset, one can focus on developing a powerful and accurate model for accident detection from CCTV footage without having to worry about the complexities of data preparation and partitioning.

Understanding the Data

An image can be thought of as a two-dimensional or three-dimensional grid of small picture elements known as pixels. Each pixel in the image corresponds to a single value in the array. For a grayscale image, each pixel value represents the intensity of the light or darkness of that particular pixel. In a color image, each pixel value is a combination of three colors - Red, Green, and Blue (RGB) - each represented by a value between 0 and 255. Therefore, an image can be represented as a multi-dimensional array of these pixel values, with the dimensions corresponding to the width, height, and number of color channels of the image. 

In the given model, the magnitude of every color constituent is denoted by a value that falls between 0 and 255. Here, 0 signifies the nonexistence of a color component and 255 denotes its maximal intensity. For instance, a red color is represented by an RGB value of (255, 0, 0), whereas green is represented by (0, 255, 0), and blue is represented by (0, 0, 255). The combinations of these primary colors in varying proportions produce a diverse range of colors that enable the creation of vibrant and lively images.

Snippets from training data

Accident Images Non-Accident Images

Training set  - Link

Train samples independent variables' array

Train samples dependent variable array

Snippets from testing dataset

Accident Images Non-Accident Images

Testing set  - Link

Test samples independent variables' array

Test samples dependent variable array