Neural Network Fundamentals

Neural networks form the foundation of deep learning, mimicking biological neurons to process information. In financial services, they power fraud detection systems and credit scoring models. In retail, they enable recommendation engines and inventory optimization.

Mathematical Foundation

Perceptron Model

The basic building block of neural networks:

Symbol Definitions:

= Output activation (neuron's response)
= Weight for input (connection strength)
= i-th input feature (data value)
= Bias term (threshold adjustment)
= Activation function (non-linearity)
= Number of input features

Multilayer Perceptron (MLP)

Stacked layers create complex decision boundaries:

Symbol Definitions:

= Activation vector at layer (layer outputs)
= Weight matrix for layer (learned parameters)
= Bias vector for layer (layer thresholds)
= Activation function for layer
= Layer index (0 = input, L = output)

Activation Functions

Sigmoid Function

Smooth S-shaped curve for binary classification:

Symbol Definitions:

= Sigmoid output (value between 0 and 1)
= Linear combination of inputs (pre-activation)
= Euler's number (≈ 2.718)

Properties:

Range: - suitable for probabilities
Differentiable everywhere - enables gradient descent

ReLU (Rectified Linear Unit)

Most popular activation for hidden layers:

Symbol Definitions:

= ReLU output (non-negative values)
= Maximum function (selects larger value)

Advantages:

Computationally efficient (simple max operation)
Reduces vanishing gradient problem
Sparse activation (many neurons output zero)

Financial Services Example: Credit Card Fraud Detection

Business Context: A financial institution uses neural networks to detect fraudulent credit card transactions in real-time, protecting customers and reducing financial losses.

Input Features:

= Transaction amount (dollars)
= Time since last transaction (minutes)
= Merchant category code (encoded)
= Geographic distance from previous transaction (miles)
= Account age (days)

Network Architecture:

Input Layer: 5 neurons (features)
Hidden Layer 1: 20 neurons with ReLU activation
Hidden Layer 2: 10 neurons with ReLU activation
Output Layer: 1 neuron with sigmoid activation (fraud probability)

Forward Propagation:

Layer 1 (Hidden):

Layer 2 (Hidden):

Output Layer:

Symbol Definitions:

= Input feature vector (transaction data)
= Hidden layer activations
= Predicted fraud probability (0 to 1)
= Sigmoid activation function

Loss Function (Binary Cross-Entropy):

Symbol Definitions:

= Average loss across batch (prediction error)
= Batch size (number of transactions)
= True label for transaction (0=legitimate, 1=fraud)
= Predicted probability for transaction

Business Impact:

Detection Rate: 95% fraud detection with 0.1% false positive rate
Response Time: Sub-second decision making for real-time blocking
Cost Savings: 50M annual fraud prevention

Retail Example: Customer Lifetime Value Prediction

Business Context: A retail chain predicts customer lifetime value (CLV) to optimize marketing spend and customer acquisition strategies.

Input Features:

= Average order value (dollars)
= Purchase frequency (orders per month)
= Customer tenure (months since first purchase)
= Product category diversity (number of categories)
= Returns ratio (returns/purchases)
= Seasonal purchasing pattern (encoded)

Network Architecture:

Input Layer: 6 neurons
Hidden Layer 1: 32 neurons with ReLU
Hidden Layer 2: 16 neurons with ReLU
Hidden Layer 3: 8 neurons with ReLU
Output Layer: 1 neuron with linear activation (CLV prediction)

Mathematical Formulation:

Hidden Layers:

Output (CLV Prediction):

Loss Function (Mean Squared Error):

Symbol Definitions:

= Predicted customer lifetime value (dollars)
= Actual CLV for customer
= Mean squared error loss

Business Applications:

1. Customer Segmentation:

2. Marketing Budget Allocation:

Symbol Definitions:

= Marketing efficiency coefficient (0.1 = 10% of CLV)
= Likelihood of successful conversion

Business Impact:

Prediction Accuracy: R² = 0.87 for CLV prediction
Marketing ROI: 40% improvement through targeted spending
Customer Acquisition: 25% more efficient budget allocation

Training Process

Gradient Descent Optimization

Weight updates based on loss gradient:

Symbol Definitions:

= Learning rate (step size parameter)
= Gradient of loss w.r.t. weights
= Assignment operator (weight update)

Backpropagation Algorithm

Efficient gradient computation using chain rule:

Symbol Definitions:

= Error term for layer (gradient signal)
= Pre-activation values at layer
= Element-wise multiplication (Hadamard product)
= Derivative of activation function at layer
= Transpose of weight matrix

Regularization Techniques

L2 Regularization (Weight Decay)

Prevents overfitting by penalizing large weights:

Symbol Definitions:

= Regularization strength (penalty coefficient)
= Frobenius norm squared (sum of squared weights)

Dropout

Randomly sets activations to zero during training:

Symbol Definitions:

= Random binary mask (0 or 1 for each neuron)
= Dropout-applied activations

Performance Metrics

Classification Metrics

For fraud detection and binary classification:

Precision:

Recall (Sensitivity):

F1-Score:

Regression Metrics

For CLV prediction and continuous values:

Mean Absolute Error (MAE):

Root Mean Squared Error (RMSE):

Neural networks provide a flexible framework for learning complex patterns in both financial services and retail applications, enabling sophisticated decision-making through hierarchical feature learning and non-linear modeling capabilities.

Overview Convolutional Neural Networks