Neural Networks (NNs)
OVERVIEW
What are Neural Networks (NNs)?
Neural networks are a type of machine learning algorithm modeled after the structure and function of the human brain. They are composed of layers of interconnected nodes, called neurons, that process and transmit information. Each neuron receives input from the previous layer, processes it, and then passes the output to the next layer until a final output is produced.
The connections between the neurons are weighted, which means that some inputs have a stronger influence on the neuron's output than others. These weights are adjusted during the training process of the neural network to optimize the network's ability to make accurate predictions or classifications.
Neural networks have been successfully applied to a wide range of tasks, including image and speech recognition, natural language processing, and autonomous vehicles. They have become a fundamental tool in the field of machine learning and artificial intelligence, and have enabled significant advances in many areas of research and industry.
In simple terms, a neural network is a computer program that is designed to learn and make predictions based on examples. Just like a human brain, a neural network is made up of interconnected "neurons" that can process and analyze data. The network is trained by feeding it lots of examples and adjusting the connections between the neurons to get better and better at making predictions. For example, a neural network might be trained to recognize images of cats by looking at lots of pictures of cats and adjusting its internal connections until it can accurately identify cats in new images. Once a neural network is trained, it can be used to make predictions on new data that it has never seen before. This makes it a powerful tool for tasks like image and speech recognition, language translation, and many other applications where computers need to make sense of complex data.
Types of Neural Networks
There are several types of neural networks, each with its own structure and purpose. Below are some of the most common types:
ANN– An artificial neural network, also known as ANN, is a type of feed-forward neural network where the inputs are sent in the forward direction. This model can also contain hidden layers, making it even denser, and has a fixed length as specified by the programmer. ANN is commonly used for processing Textual or Tabular Data, and one of its popular real-life applications is Facial Recognition. However, compared to CNN and RNN, it is considered less powerful.
CNN– Convolutional Neural Networks, or CNNs, are mainly used for Image Data and Computer Vision. They are capable of detecting objects in autonomous vehicles and other real-life applications. CNNs are made up of a combination of convolutional layers and neurons, making them more powerful than both ANN and RNN.
RNN–Recurrent Neural Networks, or RNNs, are designed to process and interpret time series data. The output from a processing node is fed back into nodes in the same or previous layers. The most well-known type of RNN is the LSTM (Long Short Term Memory) Networks.
Types of Learnings in Neural Networks
There are 3 types of learnings in Neural networks, namely
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Supervised Learning:
Supervised Learning is a learning approach that involves a supervisor, similar to learning with a teacher. Input training pairs are provided, which consist of a set of inputs and the desired output. The model's output is then compared with the desired output to calculate an error signal. This error signal is sent back into the network for weight adjustments until the model's output matches the desired output. This learning method involves feedback from the environment to the model.
Supervised Learning:
Unsupervised Learning is a type of learning that does not require supervision. It is different from supervised learning because there is no feedback from the environment, and no desired output is provided. The model learns on its own through the training phase. During this phase, the inputs are formed into classes that define the similarity of the members. Each class contains input patterns that are similar to one another. When a new pattern is inputted, the model can predict which class the input belongs to based on similarity with other patterns. If there is no such class, a new class is formed.
Reinforcement Learning:
Reinforcement Learning combines aspects of both Supervised Learning and Unsupervised Learning. It can be thought of as learning with a critique. Unlike in Supervised Learning, there is no exact feedback from the environment, but rather critique feedback is provided. This critique feedback tells the model how close its solution is to the desired output. Therefore, the model learns on its own based on this critique information. While Reinforcement Learning is similar to Supervised Learning in that it receives feedback from the environment, it differs in that it does not receive the desired output information, but rather critique information.
Forward Propagation and Backward Propagation
Forward propagation is the process of moving the input data through the network, layer by layer, to produce an output. During forward propagation, the input data is multiplied by weights and passed through activation functions to generate an output. The output is then compared to the expected output, and the difference between them is calculated as the error.
Backpropagation, on the other hand, is the process of propagating the error back through the network to adjust the weights and improve the accuracy of the model. During backpropagation, the error is propagated back through the layers of the network, and the weights are adjusted to minimize the error between the output and the expected output. This process is repeated until the error is minimized, and the model produces accurate predictions.
Different Layers in a Neural Network
Neural networks are composed of different layers of neurons, and each layer performs a specific type of computation on the input data. The most common types of layers in a neural network are:
Input layer: The input layer is the first layer of the neural network, which receives the input data and passes it to the next layer.
Hidden layer: The hidden layer is a layer between the input layer and output layer, where computations are performed by the neural network. There can be one or more hidden layers in a neural network.
Output layer: The output layer is the final layer of the neural network, which produces the output based on the input data and computations performed by the hidden layers.
Convolutional layer: Convolutional layers are commonly used in image processing applications. They apply convolutional operations to the input data to extract features that are relevant to the task.
Pooling layer: Pooling layers are also used in image processing applications. They reduce the size of the feature maps by summarizing the information in the neighboring regions.
Recurrent layer: Recurrent layers are used in applications where sequential data is processed, such as speech recognition and language modeling.
Dropout layer: Dropout layers are used to prevent overfitting in the neural network. They randomly drop out a fraction of the neurons in the layer during training.
Batch normalization layer: Batch normalization layers are used to normalize the input data to the layer. They help in faster convergence of the neural network by reducing the internal covariate shift.
Flattening Layer: This layer is used to convert a multidimensional input tensor into a one-dimensional tensor. It is typically used to prepare the input data for a dense layer, which requires a one-dimensional input. The flattening layer does not change the batch size of the input data.
Dense Layer: This is a fully connected layer in which each neuron is connected to every neuron in the previous layer. It is used for learning non-linear relationships between input data and output data. The dense layer has a weight matrix and bias vector, which are adjusted during training to optimize the model's performance. The output of a dense layer is obtained by applying an activation function to the weighted sum of the inputs and biases.
Each layer in a neural network contributes to the overall computation performed by the network, and the choice of layers used depends on the specific problem being solved.
Activation Functions in Neural Networks
An activation function in a neural network is a mathematical function that determines the output of a neuron based on its weighted inputs and bias. The activation function is applied to the weighted sum of the inputs and the bias term to produce an output, which is then fed to the next layer of neurons. The purpose of the activation function is to introduce non-linearity into the output of a neuron, allowing it to model more complex relationships between inputs and outputs.
During the training of a neural network, the selection of an appropriate activation function is a critical hyper-parameter decision that needs to be made for both the hidden layers and output layer. The activation function determines whether a neuron should be activated or not by computing a weighted sum of the input signals and a bias term. The activation function can take on either linear or non-linear transformations, which are then applied to the input signal. The resulting output from the activation function is then passed as input to the next layer of neurons in the network.
There are several different types of activation functions that can be used in a neural network. Some common choices for activation functions include:
1. Sigmoid Activation Function
The sigmoid function, with its characteristic "S"-shaped curve, is a continuous and differentiable activation function that takes real values as input and produces output between 0 and 1. However, when the input becomes too large (either positive or negative), the function saturates at 0 or 1, and the derivative becomes extremely close to zero. Moreover, the maximum gradient of the sigmoid function is only 0.25, as can be seen in the plots of the function.
As a result, during back-propagation, there is very little gradient to propagate back through the network, and the gradient that does exist becomes diluted as it progresses from the top layers to the lower layers. Therefore, the sigmoid activation function is highly susceptible to the vanishing gradient problem and belongs to the class of saturating activation functions.
2. Hyperbolic Tangent (Tanh) Activation Function
The hyperbolic tangent (tanh) function, like the sigmoid function, has an S-shaped curve that is continuous and differentiable. However, its output values range from -1 to +1, which tends to make the output of each layer more centered around 0. Moreover, the tanh function is better than the sigmoid function because it has a gradient of 1 near the origin. Despite this, the tanh function, like the sigmoid function, suffers from the vanishing gradients problem when the input becomes too large, causing the function to saturate at -1 or +1, and the derivative to be extremely close to zero. Therefore, the tanh activation function also belongs to the class of saturating activation functions.
3. Rectified Linear Unit (ReLU) Activation Function
The ReLU activation function outputs a linear value when the input is positive and zero otherwise. Its range is [0, inf) and it is continuous, but not differentiable at x = 0. Despite this, it has several advantages over other activation functions. ReLU is faster to compute compared to other functions, as it only uses a max function. It can also provide sparsity in the model by outputting true zero values. ReLU does not saturate for large positive input values, and its derivative has a constant value when x > 0, which reduces the likelihood of vanishing gradient problems.
However, ReLU is not without its drawbacks, as some neurons can effectively die during training and stop outputting anything other than 0. This occurs when a neuron's weights get updated such that the weighted sum of its input is negative. The gradient of the ReLU function is 0 when its input is negative, so the neuron is unlikely to recover. To overcome this issue, there is a variant of ReLU called leaky ReLU, which is explained below.
4. Leaky Rectified Linear Unit (leaky ReLU) Activation Function
The leaky ReLU activation function is expressed as max(αx, x) where α is a hyper-parameter that determines how much the function should leak. The slope of the function for x < 0 is determined by α and is typically set to a small value such as 0.01. This ensures that leaky ReLUs do not die, but instead can go into a coma state from which they may eventually recover. The derivative or gradient of the leaky ReLU function is constant and equal to 1 when x > 0 and 0.01 when x < 0, thus avoiding the vanishing gradient problem.
5. Exponential Linear Unit (ELU) activation function
ELU, short for exponential linear unit, is an activation function that produces negative output values when x < 0, which enables the average output of neurons to be closer to 0. The hyper-parameter "a" (also denoted as α) determines the value the ELU function approaches when x is a large negative number, usually set to 1, but can be adjusted like other hyper-parameters. ELU activation function is differentiable everywhere, including around x=0, and has a non-zero gradient for x < 0, which prevents the problem of dying neurons. The main disadvantage of ELU is that it is computationally slower compared to ReLU and its variants due to the use of the computationally expensive exponential function. Nonetheless, this drawback is compensated by faster convergence rate during training, though an ELU network may run slower than a ReLU network during test time.
6. Scaled Exponential Linear Unit (SELU) activation function
SELU is an extension of the ELU activation function and includes two fixed parameters α and λ, which are derived from the inputs. However, for standardized inputs with a mean of 0 and a standard deviation of 1, the suggested values for α and λ are 1.6733 and 1.0507, respectively.
One significant advantage of using SELU is that it provides self-normalization, ensuring that the output from SELU activation maintains a mean of 0 and a standard deviation of 1, thereby solving the vanishing or exploding gradients problem. Self-normalization is achieved when certain conditions are met, including that the neural network only contains a stack of dense layers, all hidden layers use the SELU activation function, input features are standardized, hidden layer weights are initialized with LeCun normal initialization, and the network is sequential.
The choice of activation function depends on the specific problem being solved and the structure of the neural network. Experimentation with different activation functions is often necessary to determine the best choice for a given problem.
Explore Yourself
Below is an interactive TensorFlow playground developed by Google
What is an epoch?
In machine learning, an epoch refers to a single iteration of the training data through the neural network. In other words, an epoch is a complete pass through the entire dataset during the training process. One epoch consists of one forward pass and one backward pass (i.e., one iteration of the gradient descent algorithm). Typically, neural networks are trained over multiple epochs until the loss on the training dataset converges to a minimum or the accuracy on the validation dataset starts decreasing. The number of epochs to train a neural network is a hyperparameter that needs to be tuned for each specific problem.
Comprehending how a node operates in a neural network through an illustrative example
Neural networks can be best understood by thinking of each node as an individual linear regression model with its own inputs, weights, bias (or threshold), and output. The output is determined by an activation function that passes the output of one node to the next layer in the network.
Imagine a node as a linear regression model, consisting of input data, weights, a bias, and an output. The formula can be expressed as:
Σ(wixi) + bias = w1x1 + w2x2 + w3x3 + bias
where xi represents the input, wi represents the weight assigned to each input, and bias represents a threshold value. The output of the node is determined by passing the sum of the weighted inputs and bias through an activation function, which results in an output of 1 if the sum is greater than or equal to zero, and 0 if the sum is less than zero.
In a neural network, weights are assigned to each input once an input layer is determined. These weights help to determine the significance of each input, with higher weights contributing more to the output. After the weights are assigned, the inputs are multiplied by their respective weights and then added together. The resulting sum is then passed through an activation function which determines the output. If the output exceeds a predetermined threshold, the node "fires" and passes data to the next layer in the network. This process is repeated for each node in each layer, with the output of one node becoming the input of the next node, and so on. This type of network is referred to as a feedforward network.
To demonstrate how a single node in a neural network works, let's consider the example of predicting whether a student will pass or fail a test, represented by binary values (Pass: 1, Fail: 0). There are three factors that affect the outcome: study hours (continuous variable), attendance (Yes: 1, No: 0), and exam difficulty (Easy: 1, Difficult: 0). Assuming the values of these inputs, X1 = 10 (hours studied), X2 = 1 (attended class), and X3 = 0 (difficult exam), we can assign weights to them to determine their relative importance. For example, W1 = 0.6 (more study hours result in higher scores), W2 = 0.3 (attendance affects performance), and W3 = 0.8 (difficulty level impacts scores). A threshold value of 0.5 is also assumed, which corresponds to a bias value of –0.5. We can then use these values to compute the predicted outcome, or y-hat, using the formula Y-hat = (0.610) + (0.31) + (0.8*0) – 0.5 = 5.8, indicating that the student is likely to pass the test.
In what capacity can SVM classifiers be leveraged with regard to traffic incidents and fatalities?
Neural networks can be leveraged in various ways to help prevent traffic incidents and fatalities. One potential application is in the development of advanced driver assistance systems (ADAS) and autonomous vehicles. Neural networks can be trained on large amounts of data to recognize and respond to different driving scenarios, such as identifying pedestrians, predicting the movements of other vehicles, and detecting potential hazards on the road. This can help to reduce the risk of accidents and improve road safety.
Neural networks can also be used in the analysis of traffic data to identify patterns and predict accident hotspots. For example, by analyzing historical accident data along with weather conditions, traffic flow, and other factors, neural networks can be used to identify locations and times of day that are most likely to experience accidents. This information can be used to inform traffic management and infrastructure planning decisions.
Another potential application is in the development of predictive models for traffic fatalities. By analyzing a range of data sources, including traffic volume, weather, road conditions, and driver behavior, neural networks can be trained to predict the likelihood of traffic fatalities occurring in a given area or under specific conditions. This can help to inform policy decisions around road safety and guide the allocation of resources to areas where they are most needed.
Overall, neural networks have the potential to play an important role in improving road safety and reducing the incidence of traffic accidents and fatalities.
AREA OF INTEREST
The goal is to use a Convolutional Neural Network (CNN) for real-time accident detection from CCTV footage. This is a complex task that requires the CNN to learn how to distinguish various types of accidents based on visual characteristics present in the images. One of the key benefits of using a CNN is that it can automatically extract useful features, eliminating the need for manual feature engineering.
For effective accident detection, the CNN model must be trained on a vast and varied dataset of accident and non-accident images. To this end, images extracted from real-time CCTV footage are being utilized. The dataset consists of diverse accident types, such as pedestrian accidents, bicycle accidents, and car crashes, as well as non-accident images that may contain objects or scenes resembling accidents but are not actual accidents. By employing a well-designed CNN and a meticulously curated dataset, it is possible to build a potent tool for real-time accident detection from CCTV footage.