Neural Networks (NNs)

OVERVIEW

What are Neural Networks (NNs)? 

Neural networks are a type of machine learning algorithm modeled after the structure and function of the human brain. They are composed of layers of interconnected nodes, called neurons, that process and transmit information. Each neuron receives input from the previous layer, processes it, and then passes the output to the next layer until a final output is produced.

The connections between the neurons are weighted, which means that some inputs have a stronger influence on the neuron's output than others. These weights are adjusted during the training process of the neural network to optimize the network's ability to make accurate predictions or classifications.

Neural networks have been successfully applied to a wide range of tasks, including image and speech recognition, natural language processing, and autonomous vehicles. They have become a fundamental tool in the field of machine learning and artificial intelligence, and have enabled significant advances in many areas of research and industry.


In simple terms, a neural network is a computer program that is designed to learn and make predictions based on examples. Just like a human brain, a neural network is made up of interconnected "neurons" that can process and analyze data. The network is trained by feeding it lots of examples and adjusting the connections between the neurons to get better and better at making predictions. For example, a neural network might be trained to recognize images of cats by looking at lots of pictures of cats and adjusting its internal connections until it can accurately identify cats in new images. Once a neural network is trained, it can be used to make predictions on new data that it has never seen before. This makes it a powerful tool for tasks like image and speech recognition, language translation, and many other applications where computers need to make sense of complex data.

Types of Neural Networks

There are several types of neural networks, each with its own structure and purpose. Below are some of the most common types:

Types of Learnings in Neural Networks

 There are 3 types of learnings in Neural networks, namely


Supervised Learning:

Supervised Learning is a learning approach that involves a supervisor, similar to learning with a teacher. Input training pairs are provided, which consist of a set of inputs and the desired output. The model's output is then compared with the desired output to calculate an error signal. This error signal is sent back into the network for weight adjustments until the model's output matches the desired output. This learning method involves feedback from the environment to the model.

Supervised Learning: 

Unsupervised Learning is a type of learning that does not require supervision. It is different from supervised learning because there is no feedback from the environment, and no desired output is provided. The model learns on its own through the training phase. During this phase, the inputs are formed into classes that define the similarity of the members. Each class contains input patterns that are similar to one another. When a new pattern is inputted, the model can predict which class the input belongs to based on similarity with other patterns. If there is no such class, a new class is formed.

Reinforcement Learning: 

Reinforcement Learning combines aspects of both Supervised Learning and Unsupervised Learning. It can be thought of as learning with a critique. Unlike in Supervised Learning, there is no exact feedback from the environment, but rather critique feedback is provided. This critique feedback tells the model how close its solution is to the desired output. Therefore, the model learns on its own based on this critique information. While Reinforcement Learning is similar to Supervised Learning in that it receives feedback from the environment, it differs in that it does not receive the desired output information, but rather critique information.

Forward Propagation and Backward Propagation

Forward propagation is the process of moving the input data through the network, layer by layer, to produce an output. During forward propagation, the input data is multiplied by weights and passed through activation functions to generate an output. The output is then compared to the expected output, and the difference between them is calculated as the error. 

Backpropagation, on the other hand, is the process of propagating the error back through the network to adjust the weights and improve the accuracy of the model. During backpropagation, the error is propagated back through the layers of the network, and the weights are adjusted to minimize the error between the output and the expected output. This process is repeated until the error is minimized, and the model produces accurate predictions.

Different Layers in a Neural Network

Neural networks are composed of different layers of neurons, and each layer performs a specific type of computation on the input data. The most common types of layers in a neural network are:

Each layer in a neural network contributes to the overall computation performed by the network, and the choice of layers used depends on the specific problem being solved.


Activation Functions in Neural Networks

An activation function in a neural network is a mathematical function that determines the output of a neuron based on its weighted inputs and bias. The activation function is applied to the weighted sum of the inputs and the bias term to produce an output, which is then fed to the next layer of neurons. The purpose of the activation function is to introduce non-linearity into the output of a neuron, allowing it to model more complex relationships between inputs and outputs. 

During the training of a neural network, the selection of an appropriate activation function is a critical hyper-parameter decision that needs to be made for both the hidden layers and output layer. The activation function determines whether a neuron should be activated or not by computing a weighted sum of the input signals and a bias term. The activation function can take on either linear or non-linear transformations, which are then applied to the input signal. The resulting output from the activation function is then passed as input to the next layer of neurons in the network.

There are several different types of activation functions that can be used in a neural network. Some common choices for activation functions include:

1. Sigmoid Activation Function

The sigmoid function, with its characteristic "S"-shaped curve, is a continuous and differentiable activation function that takes real values as input and produces output between 0 and 1. However, when the input becomes too large (either positive or negative), the function saturates at 0 or 1, and the derivative becomes extremely close to zero. Moreover, the maximum gradient of the sigmoid function is only 0.25, as can be seen in the plots of the function.

As a result, during back-propagation, there is very little gradient to propagate back through the network, and the gradient that does exist becomes diluted as it progresses from the top layers to the lower layers. Therefore, the sigmoid activation function is highly susceptible to the vanishing gradient problem and belongs to the class of saturating activation functions.

2. Hyperbolic Tangent (Tanh) Activation Function

The hyperbolic tangent (tanh) function, like the sigmoid function, has an S-shaped curve that is continuous and differentiable. However, its output values range from -1 to +1, which tends to make the output of each layer more centered around 0. Moreover, the tanh function is better than the sigmoid function because it has a gradient of 1 near the origin. Despite this, the tanh function, like the sigmoid function, suffers from the vanishing gradients problem when the input becomes too large, causing the function to saturate at -1 or +1, and the derivative to be extremely close to zero. Therefore, the tanh activation function also belongs to the class of saturating activation functions.

3. Rectified Linear Unit (ReLU) Activation Function

The ReLU activation function outputs a linear value when the input is positive and zero otherwise. Its range is [0, inf) and it is continuous, but not differentiable at x = 0. Despite this, it has several advantages over other activation functions. ReLU is faster to compute compared to other functions, as it only uses a max function. It can also provide sparsity in the model by outputting true zero values. ReLU does not saturate for large positive input values, and its derivative has a constant value when x > 0, which reduces the likelihood of vanishing gradient problems.

However, ReLU is not without its drawbacks, as some neurons can effectively die during training and stop outputting anything other than 0. This occurs when a neuron's weights get updated such that the weighted sum of its input is negative. The gradient of the ReLU function is 0 when its input is negative, so the neuron is unlikely to recover. To overcome this issue, there is a variant of ReLU called leaky ReLU, which is explained below.

4. Leaky Rectified Linear Unit (leaky ReLU) Activation Function

The leaky ReLU activation function is expressed as max(αx, x) where α is a hyper-parameter that determines how much the function should leak. The slope of the function for x < 0 is determined by α and is typically set to a small value such as 0.01. This ensures that leaky ReLUs do not die, but instead can go into a coma state from which they may eventually recover. The derivative or gradient of the leaky ReLU function is constant and equal to 1 when x > 0 and 0.01 when x < 0, thus avoiding the vanishing gradient problem.

5. Exponential Linear Unit (ELU) activation function

ELU, short for exponential linear unit, is an activation function that produces negative output values when x < 0, which enables the average output of neurons to be closer to 0. The hyper-parameter "a" (also denoted as α) determines the value the ELU function approaches when x is a large negative number, usually set to 1, but can be adjusted like other hyper-parameters. ELU activation function is differentiable everywhere, including around x=0, and has a non-zero gradient for x < 0, which prevents the problem of dying neurons. The main disadvantage of ELU is that it is computationally slower compared to ReLU and its variants due to the use of the computationally expensive exponential function. Nonetheless, this drawback is compensated by faster convergence rate during training, though an ELU network may run slower than a ReLU network during test time.

6. Scaled Exponential Linear Unit (SELU) activation function

SELU is an extension of the ELU activation function and includes two fixed parameters α and λ, which are derived from the inputs. However, for standardized inputs with a mean of 0 and a standard deviation of 1, the suggested values for α and λ are 1.6733 and 1.0507, respectively.

One significant advantage of using SELU is that it provides self-normalization, ensuring that the output from SELU activation maintains a mean of 0 and a standard deviation of 1, thereby solving the vanishing or exploding gradients problem. Self-normalization is achieved when certain conditions are met, including that the neural network only contains a stack of dense layers, all hidden layers use the SELU activation function, input features are standardized, hidden layer weights are initialized with LeCun normal initialization, and the network is sequential.

The choice of activation function depends on the specific problem being solved and the structure of the neural network. Experimentation with different activation functions is often necessary to determine the best choice for a given problem.

Explore Yourself

Below is an interactive TensorFlow playground developed by Google

What is an epoch?

In machine learning, an epoch refers to a single iteration of the training data through the neural network. In other words, an epoch is a complete pass through the entire dataset during the training process. One epoch consists of one forward pass and one backward pass (i.e., one iteration of the gradient descent algorithm). Typically, neural networks are trained over multiple epochs until the loss on the training dataset converges to a minimum or the accuracy on the validation dataset starts decreasing. The number of epochs to train a neural network is a hyperparameter that needs to be tuned for each specific problem.

Comprehending how a node operates in a neural network through an illustrative example

Neural networks can be best understood by thinking of each node as an individual linear regression model with its own inputs, weights, bias (or threshold), and output. The output is determined by an activation function that passes the output of one node to the next layer in the network.

Imagine a node as a linear regression model, consisting of input data, weights, a bias, and an output. The formula can be expressed as:

Σ(wixi) + bias = w1x1 + w2x2 + w3x3 + bias

where xi represents the input, wi represents the weight assigned to each input, and bias represents a threshold value. The output of the node is determined by passing the sum of the weighted inputs and bias through an activation function, which results in an output of 1 if the sum is greater than or equal to zero, and 0 if the sum is less than zero.

In a neural network, weights are assigned to each input once an input layer is determined. These weights help to determine the significance of each input, with higher weights contributing more to the output. After the weights are assigned, the inputs are multiplied by their respective weights and then added together. The resulting sum is then passed through an activation function which determines the output. If the output exceeds a predetermined threshold, the node "fires" and passes data to the next layer in the network. This process is repeated for each node in each layer, with the output of one node becoming the input of the next node, and so on. This type of network is referred to as a feedforward network.

To demonstrate how a single node in a neural network works, let's consider the example of predicting whether a student will pass or fail a test, represented by binary values (Pass: 1, Fail: 0). There are three factors that affect the outcome: study hours (continuous variable), attendance (Yes: 1, No: 0), and exam difficulty (Easy: 1, Difficult: 0). Assuming the values of these inputs, X1 = 10 (hours studied), X2 = 1 (attended class), and X3 = 0 (difficult exam), we can assign weights to them to determine their relative importance. For example, W1 = 0.6 (more study hours result in higher scores), W2 = 0.3 (attendance affects performance), and W3 = 0.8 (difficulty level impacts scores). A threshold value of 0.5 is also assumed, which corresponds to a bias value of –0.5. We can then use these values to compute the predicted outcome, or y-hat, using the formula Y-hat = (0.610) + (0.31) + (0.8*0) – 0.5 = 5.8, indicating that the student is likely to pass the test.

In what capacity can SVM classifiers be leveraged with regard to traffic incidents and fatalities? 

Neural networks can be leveraged in various ways to help prevent traffic incidents and fatalities. One potential application is in the development of advanced driver assistance systems (ADAS) and autonomous vehicles. Neural networks can be trained on large amounts of data to recognize and respond to different driving scenarios, such as identifying pedestrians, predicting the movements of other vehicles, and detecting potential hazards on the road. This can help to reduce the risk of accidents and improve road safety. 

Neural networks can also be used in the analysis of traffic data to identify patterns and predict accident hotspots. For example, by analyzing historical accident data along with weather conditions, traffic flow, and other factors, neural networks can be used to identify locations and times of day that are most likely to experience accidents. This information can be used to inform traffic management and infrastructure planning decisions. 

Another potential application is in the development of predictive models for traffic fatalities. By analyzing a range of data sources, including traffic volume, weather, road conditions, and driver behavior, neural networks can be trained to predict the likelihood of traffic fatalities occurring in a given area or under specific conditions. This can help to inform policy decisions around road safety and guide the allocation of resources to areas where they are most needed. 

Overall, neural networks have the potential to play an important role in improving road safety and reducing the incidence of traffic accidents and fatalities.

AREA OF INTEREST

The goal is to use a Convolutional Neural Network (CNN) for real-time accident detection from CCTV footage. This is a complex task that requires the CNN to learn how to distinguish various types of accidents based on visual characteristics present in the images. One of the key benefits of using a CNN is that it can automatically extract useful features, eliminating the need for manual feature engineering.

For effective accident detection, the CNN model must be trained on a vast and varied dataset of accident and non-accident images. To this end, images extracted from real-time CCTV footage are being utilized. The dataset consists of diverse accident types, such as pedestrian accidents, bicycle accidents, and car crashes, as well as non-accident images that may contain objects or scenes resembling accidents but are not actual accidents. By employing a well-designed CNN and a meticulously curated dataset, it is possible to build a potent tool for real-time accident detection from CCTV footage.