Machine Learning
Unsupervised Learning
Association Rules

Association Rules

Association rule learning discovers relationships between variables in large databases, identifying frequent patterns, associations, and causal structures. It's widely used in market basket analysis, recommendation systems, and web usage mining.

Mathematical Foundations

Basic Terminology

  • Itemset: A collection of items (e.g., butter)
  • Transaction: A set of items purchased/used together
  • Support: Frequency of itemset occurrence in dataset
  • Confidence: Likelihood of consequent given antecedent
  • Lift: Ratio of observed to expected frequency if independent

Association Rule Structure

Example: If buttermilk

Key Metrics

Support: Proportion of transactions containing the itemset

Confidence: Conditional probability of consequent given antecedent

Lift: Ratio of observed to expected support

Rust Implementation

use std::collections::{HashMap, HashSet};
use std::hash::Hash;
 
/// Association rule metrics calculator for market basket analysis.
#[derive(Debug, Clone)]
pub struct AssociationRuleMetrics {
    /// Transaction database
    transactions: Vec<HashSet<String>>,
    /// Total number of transactions
    n_transactions: usize,
    /// Item frequency counts
    item_counts: HashMap<String, usize>,
}
 
impl AssociationRuleMetrics {
    /// Create new association rule metrics calculator.
    ///
    /// # Arguments
    /// * `transactions` - Vector of transactions, each containing item names
    pub fn new(transactions: Vec<Vec<String>>) -> Self {
        let transaction_sets: Vec<HashSet<String>> = transactions
            .into_iter()
            .map(|t| t.into_iter().collect())
            .collect();
        
        let n_transactions = transaction_sets.len();
        let item_counts = Self::count_items(&transaction_sets);
        
        Self {
            transactions: transaction_sets,
            n_transactions,
            item_counts,
        }
    }
 
    /// Count frequency of each item across all transactions.
    fn count_items(transactions: &[HashSet<String>]) -> HashMap<String, usize> {
        let mut item_counts = HashMap::new();
        
        for transaction in transactions {
            for item in transaction {
                *item_counts.entry(item.clone()).or_insert(0) += 1;
            }
        }
        
        item_counts
    }
 
    /// Calculate support for an itemset.
    ///
    /// # Arguments
    /// * `itemset` - Set of items to calculate support for
    ///
    /// # Returns
    /// Support value between 0.0 and 1.0
    pub fn support(&self, itemset: &HashSet<String>) -> f64 {
        let count = self.transactions
            .iter()
            .filter(|transaction| itemset.is_subset(transaction))
            .count();
        
        count as f64 / self.n_transactions as f64
    }
 
    /// Calculate support for a single item.
    pub fn item_support(&self, item: &str) -> f64 {
        let mut itemset = HashSet::new();
        itemset.insert(item.to_string());
        self.support(&itemset)
    }
 
    /// Calculate confidence for association rule A → B.
    pub fn confidence(&self, antecedent: &HashSet<String>, consequent: &HashSet<String>) -> f64 {
        let union: HashSet<String> = antecedent.union(consequent).cloned().collect();
        let support_union = self.support(&union);
        let support_antecedent = self.support(antecedent);
        
        if support_antecedent > 0.0 {
            support_union / support_antecedent
        } else {
            0.0
        }
    }
 
    /// Calculate lift for association rule A → B.
    pub fn lift(&self, antecedent: &HashSet<String>, consequent: &HashSet<String>) -> f64 {
        let confidence = self.confidence(antecedent, consequent);
        let support_consequent = self.support(consequent);
        
        if support_consequent > 0.0 {
            confidence / support_consequent
        } else {
            0.0
        }
    }
 
    /// Calculate conviction for association rule A → B.
    pub fn conviction(&self, antecedent: &HashSet<String>, consequent: &HashSet<String>) -> f64 {
        let confidence = self.confidence(antecedent, consequent);
        let support_consequent = self.support(consequent);
        
        if confidence < 1.0 {
            (1.0 - support_consequent) / (1.0 - confidence)
        } else {
            f64::INFINITY
        }
    }
 
    /// Calculate leverage (interest) for association rule A → B.
    pub fn leverage(&self, antecedent: &HashSet<String>, consequent: &HashSet<String>) -> f64 {
        let union: HashSet<String> = antecedent.union(consequent).cloned().collect();
        let support_union = self.support(&union);
        let support_antecedent = self.support(antecedent);
        let support_consequent = self.support(consequent);
        
        support_union - (support_antecedent * support_consequent)
    }
}
 
/// Association rule representation.
#[derive(Debug, Clone)]
pub struct AssociationRule {
    pub antecedent: HashSet<String>,
    pub consequent: HashSet<String>,
    pub support: f64,
    pub confidence: f64,
    pub lift: f64,
    pub conviction: f64,
    pub leverage: f64,
}
 
impl AssociationRule {
    /// Create new association rule with calculated metrics.
    pub fn new(
        antecedent: HashSet<String>,
        consequent: HashSet<String>,
        metrics: &AssociationRuleMetrics,
    ) -> Self {
        let union: HashSet<String> = antecedent.union(&consequent).cloned().collect();
        let support = metrics.support(&union);
        let confidence = metrics.confidence(&antecedent, &consequent);
        let lift = metrics.lift(&antecedent, &consequent);
        let conviction = metrics.conviction(&antecedent, &consequent);
        let leverage = metrics.leverage(&antecedent, &consequent);
 
        Self {
            antecedent,
            consequent,
            support,
            confidence,
            lift,
            conviction,
            leverage,
        }
    }
 
    /// Display rule in readable format.
    pub fn to_string(&self) -> String {
        let antecedent: Vec<String> = self.antecedent.iter().cloned().collect();
        let consequent: Vec<String> = self.consequent.iter().cloned().collect();
        
        format!("{{{}}}} → {{{}}}", antecedent.join(", "), consequent.join(", "))
    }
}

Apriori Algorithm

The Apriori algorithm finds frequent itemsets by using the downward closure property: if an itemset is infrequent, all its supersets are also infrequent.

/// Apriori algorithm implementation for frequent itemset mining.
pub struct AprioriAlgorithm {
    /// Minimum support threshold
    min_support: f64,
    /// Minimum confidence threshold
    min_confidence: f64,
    /// Frequent itemsets by level
    frequent_itemsets: HashMap<usize, HashMap<HashSet<String>, f64>>,
    /// Generated association rules
    rules: Vec<AssociationRule>,
}
 
impl AprioriAlgorithm {
    /// Create new Apriori algorithm instance.
    pub fn new(min_support: f64, min_confidence: f64) -> Self {
        Self {
            min_support,
            min_confidence,
            frequent_itemsets: HashMap::new(),
            rules: Vec::new(),
        }
    }
 
    /// Generate candidate itemsets of size k.
    fn generate_candidates(&self, frequent_k_minus_1: &[HashSet<String>]) -> Vec<HashSet<String>> {
        let mut candidates = Vec::new();
        let items: Vec<&HashSet<String>> = frequent_k_minus_1.iter().collect();
        
        for i in 0..items.len() {
            for j in (i + 1)..items.len() {
                let union: HashSet<String> = items[i].union(items[j]).cloned().collect();
                
                // Only add if union has exactly k items
                if union.len() == items[i].len() + 1 {
                    // Check if all (k-1)-subsets are frequent
                    if self.has_frequent_subsets(&union, frequent_k_minus_1) {
                        candidates.push(union);
                    }
                }
            }
        }
        
        // Remove duplicates
        let mut unique_candidates = Vec::new();
        for candidate in candidates {
            if !unique_candidates.contains(&candidate) {
                unique_candidates.push(candidate);
            }
        }
        
        unique_candidates
    }
 
    /// Check if all (k-1)-subsets of itemset are frequent.
    fn has_frequent_subsets(&self, itemset: &HashSet<String>, frequent_k_minus_1: &[HashSet<String>]) -> bool {
        let k = itemset.len();
        if k <= 1 {
            return true;
        }
 
        // Generate all (k-1)-subsets
        let items: Vec<String> = itemset.iter().cloned().collect();
        for i in 0..items.len() {
            let mut subset = itemset.clone();
            subset.remove(&items[i]);
            
            if !frequent_k_minus_1.contains(&subset) {
                return false;
            }
        }
        
        true
    }
 
    /// Calculate support for itemsets.
    fn calculate_support(&self, itemsets: &[HashSet<String>], transactions: &[HashSet<String>]) -> HashMap<HashSet<String>, f64> {
        let mut support_counts = HashMap::new();
        
        for itemset in itemsets {
            let count = transactions
                .iter()
                .filter(|transaction| itemset.is_subset(transaction))
                .count();
            
            let support = count as f64 / transactions.len() as f64;
            support_counts.insert(itemset.clone(), support);
        }
        
        support_counts
    }
 
    /// Find all frequent itemsets using Apriori algorithm.
    pub fn find_frequent_itemsets(&mut self, transactions: &[Vec<String>]) -> &HashMap<usize, HashMap<HashSet<String>, f64>> {
        let transaction_sets: Vec<HashSet<String>> = transactions
            .iter()
            .map(|t| t.iter().cloned().collect())
            .collect();
        
        self.frequent_itemsets.clear();
        
        // Generate 1-itemsets
        let mut all_items = HashSet::new();
        for transaction in &transaction_sets {
            for item in transaction {
                let mut itemset = HashSet::new();
                itemset.insert(item.clone());
                all_items.insert(itemset);
            }
        }
        
        let candidates: Vec<HashSet<String>> = all_items.into_iter().collect();
        let support_map = self.calculate_support(&candidates, &transaction_sets);
        
        // Filter frequent 1-itemsets
        let frequent_1: HashMap<HashSet<String>, f64> = support_map
            .into_iter()
            .filter(|(_, support)| *support >= self.min_support)
            .collect();
        
        if !frequent_1.is_empty() {
            self.frequent_itemsets.insert(1, frequent_1);
        }
        
        // Generate k-itemsets for k > 1
        let mut k = 2;
        
        while let Some(frequent_k_minus_1) = self.frequent_itemsets.get(&(k - 1)) {
            let frequent_k_minus_1_sets: Vec<HashSet<String>> = frequent_k_minus_1.keys().cloned().collect();
            let candidates = self.generate_candidates(&frequent_k_minus_1_sets);
            
            if candidates.is_empty() {
                break;
            }
            
            let support_map = self.calculate_support(&candidates, &transaction_sets);
            let frequent_k: HashMap<HashSet<String>, f64> = support_map
                .into_iter()
                .filter(|(_, support)| *support >= self.min_support)
                .collect();
            
            if frequent_k.is_empty() {
                break;
            }
            
            self.frequent_itemsets.insert(k, frequent_k);
            k += 1;
        }
        
        &self.frequent_itemsets
    }
 
    /// Generate association rules from frequent itemsets.
    pub fn generate_rules(&mut self, transactions: &[Vec<String>]) -> &Vec<AssociationRule> {
        let metrics = AssociationRuleMetrics::new(transactions.to_vec());
        self.rules.clear();
        
        // Generate rules from itemsets of size 2 and above
        for k in 2..=self.frequent_itemsets.len() {
            if let Some(frequent_k) = self.frequent_itemsets.get(&k) {
                for itemset in frequent_k.keys() {
                    // Generate all possible antecedent-consequent splits
                    let items: Vec<String> = itemset.iter().cloned().collect();
                    
                    // Generate all non-empty proper subsets as antecedents
                    for i in 1..(1 << items.len()) - 1 {
                        let mut antecedent = HashSet::new();
                        let mut consequent = HashSet::new();
                        
                        for (j, item) in items.iter().enumerate() {
                            if (i >> j) & 1 == 1 {
                                antecedent.insert(item.clone());
                            } else {
                                consequent.insert(item.clone());
                            }
                        }
                        
                        let rule = AssociationRule::new(antecedent, consequent, &metrics);
                        
                        if rule.confidence >= self.min_confidence {
                            self.rules.push(rule);
                        }
                    }
                }
            }
        }
        
        // Sort rules by confidence * lift
        self.rules.sort_by(|a, b| (b.confidence * b.lift).partial_cmp(&(a.confidence * a.lift)).unwrap());
        
        &self.rules
    }
}

FP-Growth Algorithm

FP-Growth is more efficient than Apriori, avoiding candidate generation by using a compact tree structure.

/// Node in FP-Tree structure.
#[derive(Debug, Clone)]
pub struct FPNode {
    /// Item name
    item: String,
    /// Support count
    count: usize,
    /// Parent node
    parent: Option<Box<FPNode>>,
    /// Child nodes
    children: HashMap<String, Box<FPNode>>,
    /// Link to next node with same item
    next_link: Option<Box<FPNode>>,
}
 
impl FPNode {
    /// Create new FP-Tree node.
    pub fn new(item: String, count: usize) -> Self {
        Self {
            item,
            count,
            parent: None,
            children: HashMap::new(),
            next_link: None,
        }
    }
 
    /// Increment node count.
    pub fn increment(&mut self, count: usize) {
        self.count += count;
    }
}
 
/// FP-Tree implementation for efficient pattern mining.
#[derive(Debug)]
pub struct FPTree {
    /// Root node
    root: FPNode,
    /// Header table mapping items to first occurrence
    header_table: HashMap<String, Box<FPNode>>,
    /// Minimum support threshold
    min_support: f64,
    /// Item frequencies
    item_frequencies: HashMap<String, usize>,
}
 
impl FPTree {
    /// Create new FP-Tree.
    pub fn new(min_support: f64) -> Self {
        Self {
            root: FPNode::new("null".to_string(), 0),
            header_table: HashMap::new(),
            min_support,
            item_frequencies: HashMap::new(),
        }
    }
 
    /// Build FP-Tree from transactions.
    pub fn build(&mut self, transactions: &[Vec<String>]) {
        // First pass: count item frequencies
        let mut item_counts = HashMap::new();
        for transaction in transactions {
            for item in transaction {
                *item_counts.entry(item.clone()).or_insert(0) += 1;
            }
        }
 
        // Filter frequent items
        let min_count = (self.min_support * transactions.len() as f64) as usize;
        self.item_frequencies = item_counts
            .into_iter()
            .filter(|(_, count)| *count >= min_count)
            .collect();
 
        // Second pass: build tree
        for transaction in transactions {
            // Filter and sort items by frequency (descending)
            let mut frequent_items: Vec<String> = transaction
                .iter()
                .filter(|item| self.item_frequencies.contains_key(*item))
                .cloned()
                .collect();
            
            frequent_items.sort_by(|a, b| {
                self.item_frequencies[b].cmp(&self.item_frequencies[a])
            });
 
            if !frequent_items.is_empty() {
                self.insert_transaction(&frequent_items, 1);
            }
        }
    }
 
    /// Insert transaction into FP-Tree.
    fn insert_transaction(&mut self, items: &[String], count: usize) {
        let mut current = &mut self.root;
        
        for item in items {
            if current.children.contains_key(item) {
                current.children.get_mut(item).unwrap().increment(count);
            } else {
                let new_node = Box::new(FPNode::new(item.clone(), count));
                current.children.insert(item.clone(), new_node);
                
                // Update header table if this is the first occurrence
                if !self.header_table.contains_key(item) {
                    self.header_table.insert(item.clone(), current.children[item].clone());
                }
            }
            
            // This is simplified - in a full implementation, you'd need proper tree traversal
        }
    }
}
 
/// FP-Growth algorithm implementation.
pub struct FPGrowth {
    /// Minimum support threshold
    min_support: f64,
    /// Found frequent patterns
    frequent_patterns: Vec<(HashSet<String>, f64)>,
}
 
impl FPGrowth {
    /// Create new FP-Growth instance.
    pub fn new(min_support: f64) -> Self {
        Self {
            min_support,
            frequent_patterns: Vec::new(),
        }
    }
 
    /// Mine frequent patterns using FP-Growth algorithm.
    pub fn mine_patterns(&mut self, transactions: &[Vec<String>]) -> &Vec<(HashSet<String>, f64)> {
        self.frequent_patterns.clear();
        
        // Build FP-Tree
        let mut fp_tree = FPTree::new(self.min_support);
        fp_tree.build(transactions);
        
        // Mine patterns from tree (simplified implementation)
        for (item, frequency) in &fp_tree.item_frequencies {
            let support = *frequency as f64 / transactions.len() as f64;
            if support >= self.min_support {
                let mut itemset = HashSet::new();
                itemset.insert(item.clone());
                self.frequent_patterns.push((itemset, support));
            }
        }
        
        &self.frequent_patterns
    }
}

Practical Applications

Market Basket Analysis Example

use crate::machine_learning::unsupervised::association_rules::{AssociationRuleMetrics, AprioriAlgorithm};
 
/// Comprehensive market basket analysis implementation.
pub fn analyze_market_basket() {
    // Sample market basket transactions
    let transactions = vec![
        vec!["bread".to_string(), "milk".to_string(), "eggs".to_string()],
        vec!["bread".to_string(), "butter".to_string(), "milk".to_string()],
        vec!["milk".to_string(), "eggs".to_string(), "cheese".to_string()],
        vec!["bread".to_string(), "milk".to_string(), "cheese".to_string()],
        vec!["butter".to_string(), "eggs".to_string(), "cheese".to_string()],
        vec!["bread".to_string(), "butter".to_string(), "eggs".to_string()],
        vec!["milk".to_string(), "cheese".to_string()],
        vec!["bread".to_string(), "milk".to_string(), "butter".to_string(), "eggs".to_string()],
        vec!["cheese".to_string(), "eggs".to_string()],
        vec!["bread".to_string(), "milk".to_string()],
    ];
 
    // Calculate basic metrics
    let metrics = AssociationRuleMetrics::new(transactions.clone());
    
    // Create itemsets for analysis
    let mut bread_set = std::collections::HashSet::new();
    bread_set.insert("bread".to_string());
    
    let mut milk_set = std::collections::HashSet::new();
    milk_set.insert("milk".to_string());
    
    let mut bread_milk_set = std::collections::HashSet::new();
    bread_milk_set.insert("bread".to_string());
    bread_milk_set.insert("milk".to_string());
 
    println!("Market Basket Analysis Results:");
    println!("Support(bread): {:.3}", metrics.support(&bread_set));
    println!("Support(milk): {:.3}", metrics.support(&milk_set));
    println!("Support(bread, milk): {:.3}", metrics.support(&bread_milk_set));
    println!("Confidence(bread → milk): {:.3}", metrics.confidence(&bread_set, &milk_set));
    println!("Lift(bread → milk): {:.3}", metrics.lift(&bread_set, &milk_set));
    println!("Conviction(bread → milk): {:.3}", metrics.conviction(&bread_set, &milk_set));
 
    // Apply Apriori algorithm
    let mut apriori = AprioriAlgorithm::new(0.2, 0.5);
    let frequent_itemsets = apriori.find_frequent_itemsets(&transactions);
    
    println!("\nFrequent Itemsets:");
    for (k, itemsets) in frequent_itemsets {
        println!("\nSize {}:", k);
        for (itemset, support) in itemsets {
            let items: Vec<String> = itemset.iter().cloned().collect();
            println!("  {{{}}}: {:.3}", items.join(", "), support);
        }
    }
 
    // Generate association rules
    let rules = apriori.generate_rules(&transactions);
    
    println!("\nTop Association Rules:");
    println!("{:<20} {:<15} {:<8} {:<10} {:<8}", 
             "Antecedent", "Consequent", "Support", "Confidence", "Lift");
    println!("{}", "-".repeat(70));
    
    for rule in rules.iter().take(10) {
        let antecedent: Vec<String> = rule.antecedent.iter().cloned().collect();
        let consequent: Vec<String> = rule.consequent.iter().cloned().collect();
        
        println!("{:<20} → {:<13} {:<8.3} {:<10.3} {:<8.3}",
                 antecedent.join(", "),
                 consequent.join(", "),
                 rule.support,
                 rule.confidence,
                 rule.lift);
    }
}

Recommendation System

/// Association rules-based recommendation system.
pub struct AssociationRecommender {
    /// Trained association rules
    rules: Vec<AssociationRule>,
    /// Item popularity for fallback recommendations
    item_popularity: HashMap<String, f64>,
}
 
impl AssociationRecommender {
    /// Create new recommendation system.
    pub fn new() -> Self {
        Self {
            rules: Vec::new(),
            item_popularity: HashMap::new(),
        }
    }
 
    /// Train recommender on transaction data.
    pub fn train(&mut self, transactions: &[Vec<String>], min_support: f64, min_confidence: f64) {
        let mut apriori = AprioriAlgorithm::new(min_support, min_confidence);
        apriori.find_frequent_itemsets(transactions);
        self.rules = apriori.generate_rules(transactions).clone();
        
        // Calculate item popularity for fallback
        let mut item_counts = HashMap::new();
        for transaction in transactions {
            for item in transaction {
                *item_counts.entry(item.clone()).or_insert(0) += 1;
            }
        }
        
        let total_transactions = transactions.len() as f64;
        for (item, count) in item_counts {
            self.item_popularity.insert(item, count as f64 / total_transactions);
        }
    }
 
    /// Generate recommendations for a given basket.
    pub fn recommend(&self, basket: &[String], n_recommendations: usize) -> Vec<String> {
        let basket_set: HashSet<String> = basket.iter().cloned().collect();
        let mut recommendations: HashMap<String, f64> = HashMap::new();
        
        // Find applicable rules
        for rule in &self.rules {
            if rule.antecedent.is_subset(&basket_set) && 
               !rule.consequent.is_subset(&basket_set) {
                
                // Score based on confidence and lift
                let score = rule.confidence * rule.lift;
                
                for item in &rule.consequent {
                    if !basket_set.contains(item) {
                        let current_score = recommendations.get(item).unwrap_or(&0.0);
                        recommendations.insert(item.clone(), current_score.max(score));
                    }
                }
            }
        }
        
        // Sort recommendations by score
        let mut sorted_recommendations: Vec<(String, f64)> = recommendations
            .into_iter()
            .collect();
        sorted_recommendations.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
        
        // Fallback to popularity-based recommendations if needed
        if sorted_recommendations.len() < n_recommendations {
            let mut popular_items: Vec<(String, f64)> = self.item_popularity
                .iter()
                .filter(|(item, _)| !basket_set.contains(*item))
                .map(|(item, popularity)| (item.clone(), *popularity))
                .collect();
            popular_items.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
            
            for (item, _) in popular_items {
                if !sorted_recommendations.iter().any(|(rec_item, _)| rec_item == &item) {
                    sorted_recommendations.push((item, 0.0));
                }
                
                if sorted_recommendations.len() >= n_recommendations {
                    break;
                }
            }
        }
        
        sorted_recommendations
            .into_iter()
            .take(n_recommendations)
            .map(|(item, _)| item)
            .collect()
    }
}
 
/// Example usage of recommendation system.
pub fn recommendation_example() {
    let transactions = vec![
        vec!["bread".to_string(), "milk".to_string(), "eggs".to_string(), "butter".to_string()],
        vec!["bread".to_string(), "milk".to_string(), "cheese".to_string()],
        vec!["milk".to_string(), "eggs".to_string(), "yogurt".to_string()],
        vec!["bread".to_string(), "butter".to_string(), "jam".to_string()],
        vec!["milk".to_string(), "cereal".to_string(), "banana".to_string()],
        vec!["eggs".to_string(), "bacon".to_string(), "bread".to_string()],
        vec!["cheese".to_string(), "wine".to_string(), "crackers".to_string()],
        vec!["milk".to_string(), "cookies".to_string(), "chocolate".to_string()],
    ];
 
    let mut recommender = AssociationRecommender::new();
    recommender.train(&transactions, 0.1, 0.3);
    
    let test_baskets = vec![
        vec!["milk".to_string(), "bread".to_string()],
        vec!["cheese".to_string(), "wine".to_string()],
        vec!["eggs".to_string(), "bacon".to_string()],
    ];
 
    println!("Recommendation Examples:");
    for (i, basket) in test_baskets.iter().enumerate() {
        let recommendations = recommender.recommend(basket, 3);
        println!("Basket {}: {:?} → Recommendations: {:?}", 
                 i + 1, basket, recommendations);
    }
}

Association rules provide powerful insights into item relationships and enable effective recommendation systems. The Rust implementation offers memory safety and performance benefits while maintaining the mathematical rigor of traditional association rule mining algorithms.