Tree-Based Methods
Tree-based methods create predictive models using decision trees that partition the feature space into regions. In financial services, they enable interpretable credit decisions and fraud detection. In retail, they power customer segmentation and inventory optimization.
Decision Trees
Mathematical Foundation
Decision trees recursively partition the feature space:
Symbol Definitions:
- = Feature space (entire input domain)
- = Region (leaf node)
- = Number of leaf nodes
- = Empty set (regions don't overlap)
Prediction Function:
Symbol Definitions:
- = Tree prediction for input
- = Constant prediction in region
- = Indicator function (1 if in region , 0 otherwise)
Splitting Criterion
For Regression (Mean Squared Error):
Optimal Split:
Symbol Definitions:
- = Feature index for splitting
- = Split threshold value
- = Left region
- = Right region
- = Optimal constants for each region
For Classification (Gini Impurity):
Symbol Definitions:
- = Tree node
- = Number of classes
- = Proportion of class in node
Information Gain:
Symbol Definitions:
- = Information gain from split
- = Number of samples in child
- = Total samples in parent
Financial Services Example: Loan Default Prediction
Business Context: Credit union uses decision tree to make interpretable loan approval decisions while maintaining regulatory compliance and explainable AI requirements.
Features:
- = Annual income (000s)
- = Credit score (300-850)
- = Debt-to-income ratio (%)
- = Employment years
- = Loan amount requested (000s)
Decision Tree Structure:
Root Node Split:
High Risk Branch:
DTI Ratio Split:
Low Risk Branch:
Decision Rules:
Business Performance:
- Accuracy: 82.7% correct predictions
- Interpretability: Clear decision path for each application
- Regulatory Compliance: Meets fair lending documentation requirements
- Processing Speed: 1,000 applications per hour automated
- Cost Reduction: 60% fewer manual reviews required
Random Forest
Ensemble of Trees
Combines multiple decision trees using bagging:
Symbol Definitions:
- = Random Forest prediction
- = Number of trees (typically 100-500)
- = Prediction from tree
Bootstrap Sampling: Each tree trained on random subset of data:
Random Feature Selection: At each split, consider random subset of features:
Symbol Definitions:
- = Total number of features
- = Number of features considered at each split
- = Floor function
Feature Importance
Gini Importance:
Permutation Importance:
Symbol Definitions:
- = Proportion of samples reaching node
- = Gini improvement from split at node
- = Out-of-bag error after permuting feature
- = Original out-of-bag error
Retail Example: Customer Segmentation
Business Context: Fashion retailer uses Random Forest to segment customers for targeted marketing campaigns, personalizing offers based on purchase behavior and demographics.
Customer Features:
- = Average order value ()
- = Purchase frequency (orders/year)
- = Customer tenure (months)
- = Product category diversity (1-10 scale)
- = Seasonal purchasing pattern (encoded)
- = Return rate (%)
- = Age group (encoded)
- = Geographic region (encoded)
Target Segments:
- High-Value Loyalists (15% of customers, 45% of revenue)
- Frequent Shoppers (25% of customers, 35% of revenue)
- Occasional Buyers (40% of customers, 15% of revenue)
- At-Risk Churners (20% of customers, 5% of revenue)
Random Forest Model:
- Trees: 200 decision trees
- Max Depth: 15 levels
- Min Samples per Leaf: 50 customers
- Features per Split: 3 (√8 ≈ 3)
Feature Importance Ranking:
- Average Order Value (0.28) - Primary value indicator
- Purchase Frequency (0.22) - Engagement level
- Customer Tenure (0.18) - Loyalty measure
- Product Diversity (0.15) - Shopping breadth
- Return Rate (0.08) - Satisfaction proxy
- Age Group (0.05) - Demographic factor
- Seasonal Pattern (0.03) - Timing behavior
- Geographic Region (0.01) - Location influence
Segmentation Results:
High-Value Loyalists:
- Average Order Value: 180+
- Purchase Frequency: 8+ times/year
- Customer Tenure: 24+ months
- Product Diversity: 7+ categories
Marketing Strategy per Segment:
Symbol Definitions:
- = Customer Lifetime Value for segment
- = Historical campaign response rate
- = Budget allocation efficiency factor
Business Impact:
- Campaign ROI: 340% vs. 180% for mass marketing
- Customer Retention: 23% improvement in high-value segment
- Cross-selling Success: 45% increase in product diversity
- Revenue Growth: 2.8M additional quarterly revenue
Gradient Boosting
Boosting Algorithm
Sequentially builds trees to correct previous errors:
Symbol Definitions:
- = Final boosted prediction
- = Number of boosting iterations
- = Learning rate for iteration
- = Weak learner (tree) at iteration
Gradient Boosting Algorithm:
Step 1: Initialize with constant prediction
Step 2: For :
Compute negative gradients:
Fit weak learner:
Update model:
Symbol Definitions:
- = Loss function
- = Residual (negative gradient) for sample at iteration
- = Learning rate (shrinkage parameter)
XGBoost (Extreme Gradient Boosting)
Advanced gradient boosting with regularization:
Regularization Term:
Symbol Definitions:
- = Loss at iteration
- = Differentiable loss function
- = Prediction from previous iterations
- = New tree at iteration
- = Number of leaves in tree
- = Weight of leaf
- = Regularization parameters
Financial Services Example: Fraud Detection
Business Context: Payment processor uses XGBoost to detect fraudulent transactions in real-time, balancing detection accuracy with false positive rates to maintain customer experience.
Transaction Features:
- = Transaction amount (log-scaled)
- = Time since last transaction (minutes)
- = Merchant category risk score
- = Geographic velocity (distance/time)
- = Card-not-present indicator
- = Historical decline rate for merchant
- = Account age (days)
- = Spending pattern deviation score
XGBoost Configuration:
- Objective: Binary logistic regression
- Trees: 500 estimators
- Max Depth: 6 levels
- Learning Rate: 0.1
- Regularization: α=0.1, λ=1.0
Class Imbalance Handling:
Custom Loss Function:
Symbol Definitions:
- = Sample weight (higher for fraud cases)
- = Predicted fraud probability
Feature Importance:
- Amount Log-Scale (0.24) - Primary fraud indicator
- Geographic Velocity (0.19) - Location anomaly
- Merchant Risk Score (0.16) - Business type risk
- Pattern Deviation (0.15) - Behavioral anomaly
- Card-not-Present (0.12) - Transaction type
- Time Velocity (0.08) - Timing patterns
- Account Age (0.04) - Account maturity
- Decline History (0.02) - Merchant reputation
Decision Thresholds:
Business Performance:
- Fraud Detection Rate: 94.2% (vs. 87.3% rule-based)
- False Positive Rate: 0.8% (vs. 1.2% previous)
- Processing Speed: 15ms average response time
- Cost Savings: 45M annual fraud prevention
- Customer Impact: 35% fewer legitimate transactions declined
Model Interpretability
SHAP (SHapley Additive exPlanations)
Explains individual predictions:
Symbol Definitions:
- = SHAP value for feature
- = Subset of features excluding
- = Set of all features
- = Model prediction using only features in set
TreeSHAP for Tree-Based Models: Efficient computation for decision trees and ensembles.
Partial Dependence Plots
Shows feature effect on predictions:
Symbol Definitions:
- = Partial dependence of features
- = Features of interest
- = Complementary features
- = Expectation over complementary features
Advanced Tree Techniques
LightGBM
Gradient boosting with leaf-wise tree growth:
Leaf-wise Growth: Grows trees by adding leaves that reduce loss most
Categorical Feature Handling: Direct support without one-hot encoding
Memory Efficiency: Optimized data structures for large datasets
CatBoost
Handles categorical features automatically:
Ordered Target Statistics:
Symbol Definitions:
- = Target statistic for categorical feature at sample
- = Smoothing parameter
Tree-based methods provide powerful, interpretable solutions for complex prediction problems, offering excellent performance on structured data while maintaining explainability crucial for financial services and retail business applications.