Machine Learning
Deep Learning
Neural Network Fundamentals

Neural Network Fundamentals

Neural networks form the foundation of deep learning, mimicking biological neurons to process information. In financial services, they power fraud detection systems and credit scoring models. In retail, they enable recommendation engines and inventory optimization.

Mathematical Foundation

Perceptron Model

The basic building block of neural networks:

Symbol Definitions:

  • [mathematical expression] = Output activation (neuron's response)
  • [mathematical expression] = Weight for input [mathematical expression] (connection strength)
  • [mathematical expression] = i-th input feature (data value)
  • [mathematical expression] = Bias term (threshold adjustment)
  • [mathematical expression] = Activation function (non-linearity)
  • [mathematical expression] = Number of input features

Multilayer Perceptron (MLP)

Stacked layers create complex decision boundaries:

Symbol Definitions:

  • [mathematical expression] = Activation vector at layer [mathematical expression] (layer outputs)
  • [mathematical expression] = Weight matrix for layer [mathematical expression] (learned parameters)
  • [mathematical expression] = Bias vector for layer [mathematical expression] (layer thresholds)
  • [mathematical expression] = Activation function for layer [mathematical expression]
  • [mathematical expression] = Layer index (0 = input, L = output)

Activation Functions

Sigmoid Function

Smooth S-shaped curve for binary classification:

Symbol Definitions:

  • [mathematical expression] = Sigmoid output (value between 0 and 1)
  • [mathematical expression] = Linear combination of inputs (pre-activation)
  • [mathematical expression] = Euler's number (≈ 2.718)

Properties:

  • Range: [mathematical expression] - suitable for probabilities
  • Differentiable everywhere - enables gradient descent

ReLU (Rectified Linear Unit)

Most popular activation for hidden layers:

Symbol Definitions:

  • [mathematical expression] = ReLU output (non-negative values)
  • [mathematical expression] = Maximum function (selects larger value)

Advantages:

  • Computationally efficient (simple max operation)
  • Reduces vanishing gradient problem
  • Sparse activation (many neurons output zero)

Financial Services Example: Credit Card Fraud Detection

Business Context: A financial institution uses neural networks to detect fraudulent credit card transactions in real-time, protecting customers and reducing financial losses.

Input Features:

  • [mathematical expression] = Transaction amount (dollars)
  • [mathematical expression] = Time since last transaction (minutes)
  • [mathematical expression] = Merchant category code (encoded)
  • [mathematical expression] = Geographic distance from previous transaction (miles)
  • [mathematical expression] = Account age (days)

Network Architecture:

  • Input Layer: 5 neurons (features)
  • Hidden Layer 1: 20 neurons with ReLU activation
  • Hidden Layer 2: 10 neurons with ReLU activation
  • Output Layer: 1 neuron with sigmoid activation (fraud probability)

Forward Propagation:

Layer 1 (Hidden):

Layer 2 (Hidden):

Output Layer:

Symbol Definitions:

  • [mathematical expression] = Input feature vector (transaction data)
  • [mathematical expression] = Hidden layer activations
  • [mathematical expression] = Predicted fraud probability (0 to 1)
  • [mathematical expression] = Sigmoid activation function

Loss Function (Binary Cross-Entropy):

Symbol Definitions:

  • [mathematical expression] = Average loss across batch (prediction error)
  • [mathematical expression] = Batch size (number of transactions)
  • [mathematical expression] = True label for transaction [mathematical expression] (0=legitimate, 1=fraud)
  • [mathematical expression] = Predicted probability for transaction [mathematical expression]

Business Impact:

  • Detection Rate: 95% fraud detection with 0.1% false positive rate
  • Response Time: Sub-second decision making for real-time blocking
  • Cost Savings: 50M annual fraud prevention

Retail Example: Customer Lifetime Value Prediction

Business Context: A retail chain predicts customer lifetime value (CLV) to optimize marketing spend and customer acquisition strategies.

Input Features:

  • [mathematical expression] = Average order value (dollars)
  • [mathematical expression] = Purchase frequency (orders per month)
  • [mathematical expression] = Customer tenure (months since first purchase)
  • [mathematical expression] = Product category diversity (number of categories)
  • [mathematical expression] = Returns ratio (returns/purchases)
  • [mathematical expression] = Seasonal purchasing pattern (encoded)

Network Architecture:

  • Input Layer: 6 neurons
  • Hidden Layer 1: 32 neurons with ReLU
  • Hidden Layer 2: 16 neurons with ReLU
  • Hidden Layer 3: 8 neurons with ReLU
  • Output Layer: 1 neuron with linear activation (CLV prediction)

Mathematical Formulation:

Hidden Layers:

Output (CLV Prediction):

Loss Function (Mean Squared Error):

Symbol Definitions:

  • [mathematical expression] = Predicted customer lifetime value (dollars)
  • [mathematical expression] = Actual CLV for customer [mathematical expression]
  • [mathematical expression] = Mean squared error loss

Business Applications:

1. Customer Segmentation:

2. Marketing Budget Allocation:

Symbol Definitions:

  • [mathematical expression] = Marketing efficiency coefficient (0.1 = 10% of CLV)
  • [mathematical expression] = Likelihood of successful conversion

Business Impact:

  • Prediction Accuracy: R² = 0.87 for CLV prediction
  • Marketing ROI: 40% improvement through targeted spending
  • Customer Acquisition: 25% more efficient budget allocation

Training Process

Gradient Descent Optimization

Weight updates based on loss gradient:

Symbol Definitions:

  • [mathematical expression] = Learning rate (step size parameter)
  • [mathematical expression] = Gradient of loss w.r.t. weights
  • [mathematical expression] = Assignment operator (weight update)

Backpropagation Algorithm

Efficient gradient computation using chain rule:

Symbol Definitions:

  • [mathematical expression] = Error term for layer [mathematical expression] (gradient signal)
  • [mathematical expression] = Pre-activation values at layer [mathematical expression]
  • [mathematical expression] = Element-wise multiplication (Hadamard product)
  • [mathematical expression] = Derivative of activation function at layer [mathematical expression]
  • [mathematical expression] = Transpose of weight matrix

Regularization Techniques

L2 Regularization (Weight Decay)

Prevents overfitting by penalizing large weights:

Symbol Definitions:

  • [mathematical expression] = Regularization strength (penalty coefficient)
  • [mathematical expression] = Frobenius norm squared (sum of squared weights)

Dropout

Randomly sets activations to zero during training:

Symbol Definitions:

  • [mathematical expression] = Random binary mask (0 or 1 for each neuron)
  • [mathematical expression] = Dropout-applied activations

Performance Metrics

Classification Metrics

For fraud detection and binary classification:

Precision:

Recall (Sensitivity):

F1-Score:

Regression Metrics

For CLV prediction and continuous values:

Mean Absolute Error (MAE):

Root Mean Squared Error (RMSE):

Neural networks provide a flexible framework for learning complex patterns in both financial services and retail applications, enabling sophisticated decision-making through hierarchical feature learning and non-linear modeling capabilities.