Machine Learning
Supervised Learning
Tree-Based Methods

Tree-Based Methods

Tree-based methods create predictive models using decision trees that partition the feature space into regions. In financial services, they enable interpretable credit decisions and fraud detection. In retail, they power customer segmentation and inventory optimization.

Decision Trees

Mathematical Foundation

Decision trees recursively partition the feature space:

Symbol Definitions:

  • [mathematical expression] = Feature space (entire input domain)
  • [mathematical expression] = Region [mathematical expression] (leaf node)
  • [mathematical expression] = Number of leaf nodes
  • [mathematical expression] = Empty set (regions don't overlap)

Prediction Function:

Symbol Definitions:

  • [mathematical expression] = Tree prediction for input [mathematical expression]
  • [mathematical expression] = Constant prediction in region [mathematical expression]
  • [mathematical expression] = Indicator function (1 if [mathematical expression] in region [mathematical expression], 0 otherwise)

Splitting Criterion

For Regression (Mean Squared Error):

Optimal Split:

Symbol Definitions:

  • [mathematical expression] = Feature index for splitting
  • [mathematical expression] = Split threshold value
  • [mathematical expression] = Left region
  • [mathematical expression] = Right region
  • [mathematical expression] = Optimal constants for each region

For Classification (Gini Impurity):

Symbol Definitions:

  • [mathematical expression] = Tree node
  • [mathematical expression] = Number of classes
  • [mathematical expression] = Proportion of class [mathematical expression] in node [mathematical expression]

Information Gain:

Symbol Definitions:

  • [mathematical expression] = Information gain from split
  • [mathematical expression] = Number of samples in child [mathematical expression]
  • [mathematical expression] = Total samples in parent

Financial Services Example: Loan Default Prediction

Business Context: Credit union uses decision tree to make interpretable loan approval decisions while maintaining regulatory compliance and explainable AI requirements.

Features:

  • [mathematical expression] = Annual income (000s)
  • [mathematical expression] = Credit score (300-850)
  • [mathematical expression] = Debt-to-income ratio (%)
  • [mathematical expression] = Employment years
  • [mathematical expression] = Loan amount requested (000s)

Decision Tree Structure:

Root Node Split:

High Risk Branch:

DTI Ratio Split:

Low Risk Branch:

Decision Rules:

Business Performance:

  • Accuracy: 82.7% correct predictions
  • Interpretability: Clear decision path for each application
  • Regulatory Compliance: Meets fair lending documentation requirements
  • Processing Speed: 1,000 applications per hour automated
  • Cost Reduction: 60% fewer manual reviews required

Random Forest

Ensemble of Trees

Combines multiple decision trees using bagging:

Symbol Definitions:

  • [mathematical expression] = Random Forest prediction
  • [mathematical expression] = Number of trees (typically 100-500)
  • [mathematical expression] = Prediction from tree [mathematical expression]

Bootstrap Sampling: Each tree trained on random subset of data:

Random Feature Selection: At each split, consider random subset of features:

Symbol Definitions:

  • [mathematical expression] = Total number of features
  • [mathematical expression] = Number of features considered at each split
  • [mathematical expression] = Floor function

Feature Importance

Gini Importance:

Permutation Importance:

Symbol Definitions:

  • [mathematical expression] = Proportion of samples reaching node [mathematical expression]
  • [mathematical expression] = Gini improvement from split at node [mathematical expression]
  • [mathematical expression] = Out-of-bag error after permuting feature [mathematical expression]
  • [mathematical expression] = Original out-of-bag error

Retail Example: Customer Segmentation

Business Context: Fashion retailer uses Random Forest to segment customers for targeted marketing campaigns, personalizing offers based on purchase behavior and demographics.

Customer Features:

  • [mathematical expression] = Average order value ()
  • [mathematical expression] = Purchase frequency (orders/year)
  • [mathematical expression] = Customer tenure (months)
  • [mathematical expression] = Product category diversity (1-10 scale)
  • [mathematical expression] = Seasonal purchasing pattern (encoded)
  • [mathematical expression] = Return rate (%)
  • [mathematical expression] = Age group (encoded)
  • [mathematical expression] = Geographic region (encoded)

Target Segments:

  1. High-Value Loyalists (15% of customers, 45% of revenue)
  2. Frequent Shoppers (25% of customers, 35% of revenue)
  3. Occasional Buyers (40% of customers, 15% of revenue)
  4. At-Risk Churners (20% of customers, 5% of revenue)

Random Forest Model:

  • Trees: 200 decision trees
  • Max Depth: 15 levels
  • Min Samples per Leaf: 50 customers
  • Features per Split: 3 (√8 ≈ 3)

Feature Importance Ranking:

  1. Average Order Value (0.28) - Primary value indicator
  2. Purchase Frequency (0.22) - Engagement level
  3. Customer Tenure (0.18) - Loyalty measure
  4. Product Diversity (0.15) - Shopping breadth
  5. Return Rate (0.08) - Satisfaction proxy
  6. Age Group (0.05) - Demographic factor
  7. Seasonal Pattern (0.03) - Timing behavior
  8. Geographic Region (0.01) - Location influence

Segmentation Results:

High-Value Loyalists:

  • Average Order Value: 180+
  • Purchase Frequency: 8+ times/year
  • Customer Tenure: 24+ months
  • Product Diversity: 7+ categories

Marketing Strategy per Segment:

Symbol Definitions:

  • [mathematical expression] = Customer Lifetime Value for segment [mathematical expression]
  • [mathematical expression] = Historical campaign response rate
  • [mathematical expression] = Budget allocation efficiency factor

Business Impact:

  • Campaign ROI: 340% vs. 180% for mass marketing
  • Customer Retention: 23% improvement in high-value segment
  • Cross-selling Success: 45% increase in product diversity
  • Revenue Growth: 2.8M additional quarterly revenue

Gradient Boosting

Boosting Algorithm

Sequentially builds trees to correct previous errors:

Symbol Definitions:

  • [mathematical expression] = Final boosted prediction
  • [mathematical expression] = Number of boosting iterations
  • [mathematical expression] = Learning rate for iteration [mathematical expression]
  • [mathematical expression] = Weak learner (tree) at iteration [mathematical expression]

Gradient Boosting Algorithm:

Step 1: Initialize with constant prediction

Step 2: For [mathematical expression]:

Compute negative gradients:

Fit weak learner:

Update model:

Symbol Definitions:

  • [mathematical expression] = Loss function
  • [mathematical expression] = Residual (negative gradient) for sample [mathematical expression] at iteration [mathematical expression]
  • [mathematical expression] = Learning rate (shrinkage parameter)

XGBoost (Extreme Gradient Boosting)

Advanced gradient boosting with regularization:

Regularization Term:

Symbol Definitions:

  • [mathematical expression] = Loss at iteration [mathematical expression]
  • [mathematical expression] = Differentiable loss function
  • [mathematical expression] = Prediction from previous iterations
  • [mathematical expression] = New tree at iteration [mathematical expression]
  • [mathematical expression] = Number of leaves in tree
  • [mathematical expression] = Weight of leaf [mathematical expression]
  • [mathematical expression] = Regularization parameters

Financial Services Example: Fraud Detection

Business Context: Payment processor uses XGBoost to detect fraudulent transactions in real-time, balancing detection accuracy with false positive rates to maintain customer experience.

Transaction Features:

  • [mathematical expression] = Transaction amount (log-scaled)
  • [mathematical expression] = Time since last transaction (minutes)
  • [mathematical expression] = Merchant category risk score
  • [mathematical expression] = Geographic velocity (distance/time)
  • [mathematical expression] = Card-not-present indicator
  • [mathematical expression] = Historical decline rate for merchant
  • [mathematical expression] = Account age (days)
  • [mathematical expression] = Spending pattern deviation score

XGBoost Configuration:

  • Objective: Binary logistic regression
  • Trees: 500 estimators
  • Max Depth: 6 levels
  • Learning Rate: 0.1
  • Regularization: α=0.1, λ=1.0

Class Imbalance Handling:

Custom Loss Function:

Symbol Definitions:

  • [mathematical expression] = Sample weight (higher for fraud cases)
  • [mathematical expression] = Predicted fraud probability

Feature Importance:

  1. Amount Log-Scale (0.24) - Primary fraud indicator
  2. Geographic Velocity (0.19) - Location anomaly
  3. Merchant Risk Score (0.16) - Business type risk
  4. Pattern Deviation (0.15) - Behavioral anomaly
  5. Card-not-Present (0.12) - Transaction type
  6. Time Velocity (0.08) - Timing patterns
  7. Account Age (0.04) - Account maturity
  8. Decline History (0.02) - Merchant reputation

Decision Thresholds:

Business Performance:

  • Fraud Detection Rate: 94.2% (vs. 87.3% rule-based)
  • False Positive Rate: 0.8% (vs. 1.2% previous)
  • Processing Speed: 15ms average response time
  • Cost Savings: 45M annual fraud prevention
  • Customer Impact: 35% fewer legitimate transactions declined

Model Interpretability

SHAP (SHapley Additive exPlanations)

Explains individual predictions:

Symbol Definitions:

  • [mathematical expression] = SHAP value for feature [mathematical expression]
  • [mathematical expression] = Subset of features excluding [mathematical expression]
  • [mathematical expression] = Set of all features
  • [mathematical expression] = Model prediction using only features in set [mathematical expression]

TreeSHAP for Tree-Based Models: Efficient computation for decision trees and ensembles.

Partial Dependence Plots

Shows feature effect on predictions:

Symbol Definitions:

  • [mathematical expression] = Partial dependence of features [mathematical expression]
  • [mathematical expression] = Features of interest
  • [mathematical expression] = Complementary features
  • [mathematical expression] = Expectation over complementary features

Advanced Tree Techniques

LightGBM

Gradient boosting with leaf-wise tree growth:

Leaf-wise Growth: Grows trees by adding leaves that reduce loss most

Categorical Feature Handling: Direct support without one-hot encoding

Memory Efficiency: Optimized data structures for large datasets

CatBoost

Handles categorical features automatically:

Ordered Target Statistics:

Symbol Definitions:

  • [mathematical expression] = Target statistic for categorical feature [mathematical expression] at sample [mathematical expression]
  • [mathematical expression] = Smoothing parameter

Tree-based methods provide powerful, interpretable solutions for complex prediction problems, offering excellent performance on structured data while maintaining explainability crucial for financial services and retail business applications.