Tree-Based Methods

Tree-based methods create predictive models using decision trees that partition the feature space into regions. In financial services, they enable interpretable credit decisions and fraud detection. In retail, they power customer segmentation and inventory optimization.

Decision Trees

Mathematical Foundation

Decision trees recursively partition the feature space:

Symbol Definitions:

= Feature space (entire input domain)
= Region (leaf node)
= Number of leaf nodes
= Empty set (regions don't overlap)

Prediction Function:

Symbol Definitions:

= Tree prediction for input
= Constant prediction in region
= Indicator function (1 if in region , 0 otherwise)

Splitting Criterion

For Regression (Mean Squared Error):

Optimal Split:

Symbol Definitions:

= Feature index for splitting
= Split threshold value
= Left region
= Right region
= Optimal constants for each region

For Classification (Gini Impurity):

Symbol Definitions:

= Tree node
= Number of classes
= Proportion of class in node

Information Gain:

Symbol Definitions:

= Information gain from split
= Number of samples in child
= Total samples in parent

Financial Services Example: Loan Default Prediction

Business Context: Credit union uses decision tree to make interpretable loan approval decisions while maintaining regulatory compliance and explainable AI requirements.

Features:

= Annual income (000s)
= Credit score (300-850)
= Debt-to-income ratio (%)
= Employment years
= Loan amount requested (000s)

Decision Tree Structure:

Root Node Split:

High Risk Branch:

DTI Ratio Split:

Low Risk Branch:

Decision Rules:

Business Performance:

Accuracy: 82.7% correct predictions
Interpretability: Clear decision path for each application
Regulatory Compliance: Meets fair lending documentation requirements
Processing Speed: 1,000 applications per hour automated
Cost Reduction: 60% fewer manual reviews required

Random Forest

Ensemble of Trees

Combines multiple decision trees using bagging:

Symbol Definitions:

= Random Forest prediction
= Number of trees (typically 100-500)
= Prediction from tree

Bootstrap Sampling: Each tree trained on random subset of data:

Random Feature Selection: At each split, consider random subset of features:

Symbol Definitions:

= Total number of features
= Number of features considered at each split
= Floor function

Feature Importance

Gini Importance:

Permutation Importance:

Symbol Definitions:

= Proportion of samples reaching node
= Gini improvement from split at node
= Out-of-bag error after permuting feature
= Original out-of-bag error

Retail Example: Customer Segmentation

Business Context: Fashion retailer uses Random Forest to segment customers for targeted marketing campaigns, personalizing offers based on purchase behavior and demographics.

Customer Features:

= Average order value ()
= Purchase frequency (orders/year)
= Customer tenure (months)
= Product category diversity (1-10 scale)
= Seasonal purchasing pattern (encoded)
= Return rate (%)
= Age group (encoded)
= Geographic region (encoded)

Target Segments:

High-Value Loyalists (15% of customers, 45% of revenue)
Frequent Shoppers (25% of customers, 35% of revenue)
Occasional Buyers (40% of customers, 15% of revenue)
At-Risk Churners (20% of customers, 5% of revenue)

Random Forest Model:

Trees: 200 decision trees
Max Depth: 15 levels
Min Samples per Leaf: 50 customers
Features per Split: 3 (√8 ≈ 3)

Feature Importance Ranking:

Average Order Value (0.28) - Primary value indicator
Purchase Frequency (0.22) - Engagement level
Customer Tenure (0.18) - Loyalty measure
Product Diversity (0.15) - Shopping breadth
Return Rate (0.08) - Satisfaction proxy
Age Group (0.05) - Demographic factor
Seasonal Pattern (0.03) - Timing behavior
Geographic Region (0.01) - Location influence

Segmentation Results:

High-Value Loyalists:

Average Order Value: 180+
Purchase Frequency: 8+ times/year
Customer Tenure: 24+ months
Product Diversity: 7+ categories

Marketing Strategy per Segment:

Symbol Definitions:

= Customer Lifetime Value for segment
= Historical campaign response rate
= Budget allocation efficiency factor

Business Impact:

Campaign ROI: 340% vs. 180% for mass marketing
Customer Retention: 23% improvement in high-value segment
Cross-selling Success: 45% increase in product diversity
Revenue Growth: 2.8M additional quarterly revenue

Gradient Boosting

Boosting Algorithm

Sequentially builds trees to correct previous errors:

Symbol Definitions:

= Final boosted prediction
= Number of boosting iterations
= Learning rate for iteration
= Weak learner (tree) at iteration

Gradient Boosting Algorithm:

Step 1: Initialize with constant prediction

Step 2: For :

Compute negative gradients:

Fit weak learner:

Update model:

Symbol Definitions:

= Loss function
= Residual (negative gradient) for sample at iteration
= Learning rate (shrinkage parameter)

XGBoost (Extreme Gradient Boosting)

Advanced gradient boosting with regularization:

Regularization Term:

Symbol Definitions:

= Loss at iteration
= Differentiable loss function
= Prediction from previous iterations
= New tree at iteration
= Number of leaves in tree
= Weight of leaf
= Regularization parameters

Financial Services Example: Fraud Detection

Business Context: Payment processor uses XGBoost to detect fraudulent transactions in real-time, balancing detection accuracy with false positive rates to maintain customer experience.

Transaction Features:

= Transaction amount (log-scaled)
= Time since last transaction (minutes)
= Merchant category risk score
= Geographic velocity (distance/time)
= Card-not-present indicator
= Historical decline rate for merchant
= Account age (days)
= Spending pattern deviation score

XGBoost Configuration:

Objective: Binary logistic regression
Trees: 500 estimators
Max Depth: 6 levels
Learning Rate: 0.1
Regularization: α=0.1, λ=1.0

Class Imbalance Handling:

Custom Loss Function:

Symbol Definitions:

= Sample weight (higher for fraud cases)
= Predicted fraud probability

Feature Importance:

Amount Log-Scale (0.24) - Primary fraud indicator
Geographic Velocity (0.19) - Location anomaly
Merchant Risk Score (0.16) - Business type risk
Pattern Deviation (0.15) - Behavioral anomaly
Card-not-Present (0.12) - Transaction type
Time Velocity (0.08) - Timing patterns
Account Age (0.04) - Account maturity
Decline History (0.02) - Merchant reputation

Decision Thresholds:

Business Performance:

Fraud Detection Rate: 94.2% (vs. 87.3% rule-based)
False Positive Rate: 0.8% (vs. 1.2% previous)
Processing Speed: 15ms average response time
Cost Savings: 45M annual fraud prevention
Customer Impact: 35% fewer legitimate transactions declined

Model Interpretability

SHAP (SHapley Additive exPlanations)

Explains individual predictions:

Symbol Definitions:

= SHAP value for feature
= Subset of features excluding
= Set of all features
= Model prediction using only features in set

TreeSHAP for Tree-Based Models: Efficient computation for decision trees and ensembles.

Partial Dependence Plots

Shows feature effect on predictions:

Symbol Definitions:

= Partial dependence of features
= Features of interest
= Complementary features
= Expectation over complementary features

Advanced Tree Techniques

LightGBM

Gradient boosting with leaf-wise tree growth:

Leaf-wise Growth: Grows trees by adding leaves that reduce loss most

Categorical Feature Handling: Direct support without one-hot encoding

Memory Efficiency: Optimized data structures for large datasets

CatBoost

Handles categorical features automatically:

Ordered Target Statistics:

Symbol Definitions:

= Target statistic for categorical feature at sample
= Smoothing parameter

Tree-based methods provide powerful, interpretable solutions for complex prediction problems, offering excellent performance on structured data while maintaining explainability crucial for financial services and retail business applications.

Linear Models Support Vector Machines