Association Rules
Association rule learning discovers relationships between variables in large databases, identifying frequent patterns, associations, and causal structures. It's widely used in market basket analysis, recommendation systems, and web usage mining.
Mathematical Foundations
Basic Terminology
- Itemset: A collection of items (e.g., butter)
- Transaction: A set of items purchased/used together
- Support: Frequency of itemset occurrence in dataset
- Confidence: Likelihood of consequent given antecedent
- Lift: Ratio of observed to expected frequency if independent
Association Rule Structure
Example: If butter → milk
Key Metrics
Support: Proportion of transactions containing the itemset
Confidence: Conditional probability of consequent given antecedent
Lift: Ratio of observed to expected support
Rust Implementation
use std::collections::{HashMap, HashSet};
use std::hash::Hash;
/// Association rule metrics calculator for market basket analysis.
#[derive(Debug, Clone)]
pub struct AssociationRuleMetrics {
/// Transaction database
transactions: Vec<HashSet<String>>,
/// Total number of transactions
n_transactions: usize,
/// Item frequency counts
item_counts: HashMap<String, usize>,
}
impl AssociationRuleMetrics {
/// Create new association rule metrics calculator.
///
/// # Arguments
/// * `transactions` - Vector of transactions, each containing item names
pub fn new(transactions: Vec<Vec<String>>) -> Self {
let transaction_sets: Vec<HashSet<String>> = transactions
.into_iter()
.map(|t| t.into_iter().collect())
.collect();
let n_transactions = transaction_sets.len();
let item_counts = Self::count_items(&transaction_sets);
Self {
transactions: transaction_sets,
n_transactions,
item_counts,
}
}
/// Count frequency of each item across all transactions.
fn count_items(transactions: &[HashSet<String>]) -> HashMap<String, usize> {
let mut item_counts = HashMap::new();
for transaction in transactions {
for item in transaction {
*item_counts.entry(item.clone()).or_insert(0) += 1;
}
}
item_counts
}
/// Calculate support for an itemset.
///
/// # Arguments
/// * `itemset` - Set of items to calculate support for
///
/// # Returns
/// Support value between 0.0 and 1.0
pub fn support(&self, itemset: &HashSet<String>) -> f64 {
let count = self.transactions
.iter()
.filter(|transaction| itemset.is_subset(transaction))
.count();
count as f64 / self.n_transactions as f64
}
/// Calculate support for a single item.
pub fn item_support(&self, item: &str) -> f64 {
let mut itemset = HashSet::new();
itemset.insert(item.to_string());
self.support(&itemset)
}
/// Calculate confidence for association rule A → B.
pub fn confidence(&self, antecedent: &HashSet<String>, consequent: &HashSet<String>) -> f64 {
let union: HashSet<String> = antecedent.union(consequent).cloned().collect();
let support_union = self.support(&union);
let support_antecedent = self.support(antecedent);
if support_antecedent > 0.0 {
support_union / support_antecedent
} else {
0.0
}
}
/// Calculate lift for association rule A → B.
pub fn lift(&self, antecedent: &HashSet<String>, consequent: &HashSet<String>) -> f64 {
let confidence = self.confidence(antecedent, consequent);
let support_consequent = self.support(consequent);
if support_consequent > 0.0 {
confidence / support_consequent
} else {
0.0
}
}
/// Calculate conviction for association rule A → B.
pub fn conviction(&self, antecedent: &HashSet<String>, consequent: &HashSet<String>) -> f64 {
let confidence = self.confidence(antecedent, consequent);
let support_consequent = self.support(consequent);
if confidence < 1.0 {
(1.0 - support_consequent) / (1.0 - confidence)
} else {
f64::INFINITY
}
}
/// Calculate leverage (interest) for association rule A → B.
pub fn leverage(&self, antecedent: &HashSet<String>, consequent: &HashSet<String>) -> f64 {
let union: HashSet<String> = antecedent.union(consequent).cloned().collect();
let support_union = self.support(&union);
let support_antecedent = self.support(antecedent);
let support_consequent = self.support(consequent);
support_union - (support_antecedent * support_consequent)
}
}
/// Association rule representation.
#[derive(Debug, Clone)]
pub struct AssociationRule {
pub antecedent: HashSet<String>,
pub consequent: HashSet<String>,
pub support: f64,
pub confidence: f64,
pub lift: f64,
pub conviction: f64,
pub leverage: f64,
}
impl AssociationRule {
/// Create new association rule with calculated metrics.
pub fn new(
antecedent: HashSet<String>,
consequent: HashSet<String>,
metrics: &AssociationRuleMetrics,
) -> Self {
let union: HashSet<String> = antecedent.union(&consequent).cloned().collect();
let support = metrics.support(&union);
let confidence = metrics.confidence(&antecedent, &consequent);
let lift = metrics.lift(&antecedent, &consequent);
let conviction = metrics.conviction(&antecedent, &consequent);
let leverage = metrics.leverage(&antecedent, &consequent);
Self {
antecedent,
consequent,
support,
confidence,
lift,
conviction,
leverage,
}
}
/// Display rule in readable format.
pub fn to_string(&self) -> String {
let antecedent: Vec<String> = self.antecedent.iter().cloned().collect();
let consequent: Vec<String> = self.consequent.iter().cloned().collect();
format!("{{{}}}} → {{{}}}", antecedent.join(", "), consequent.join(", "))
}
}
Apriori Algorithm
The Apriori algorithm finds frequent itemsets by using the downward closure property: if an itemset is infrequent, all its supersets are also infrequent.
/// Apriori algorithm implementation for frequent itemset mining.
pub struct AprioriAlgorithm {
/// Minimum support threshold
min_support: f64,
/// Minimum confidence threshold
min_confidence: f64,
/// Frequent itemsets by level
frequent_itemsets: HashMap<usize, HashMap<HashSet<String>, f64>>,
/// Generated association rules
rules: Vec<AssociationRule>,
}
impl AprioriAlgorithm {
/// Create new Apriori algorithm instance.
pub fn new(min_support: f64, min_confidence: f64) -> Self {
Self {
min_support,
min_confidence,
frequent_itemsets: HashMap::new(),
rules: Vec::new(),
}
}
/// Generate candidate itemsets of size k.
fn generate_candidates(&self, frequent_k_minus_1: &[HashSet<String>]) -> Vec<HashSet<String>> {
let mut candidates = Vec::new();
let items: Vec<&HashSet<String>> = frequent_k_minus_1.iter().collect();
for i in 0..items.len() {
for j in (i + 1)..items.len() {
let union: HashSet<String> = items[i].union(items[j]).cloned().collect();
// Only add if union has exactly k items
if union.len() == items[i].len() + 1 {
// Check if all (k-1)-subsets are frequent
if self.has_frequent_subsets(&union, frequent_k_minus_1) {
candidates.push(union);
}
}
}
}
// Remove duplicates
let mut unique_candidates = Vec::new();
for candidate in candidates {
if !unique_candidates.contains(&candidate) {
unique_candidates.push(candidate);
}
}
unique_candidates
}
/// Check if all (k-1)-subsets of itemset are frequent.
fn has_frequent_subsets(&self, itemset: &HashSet<String>, frequent_k_minus_1: &[HashSet<String>]) -> bool {
let k = itemset.len();
if k <= 1 {
return true;
}
// Generate all (k-1)-subsets
let items: Vec<String> = itemset.iter().cloned().collect();
for i in 0..items.len() {
let mut subset = itemset.clone();
subset.remove(&items[i]);
if !frequent_k_minus_1.contains(&subset) {
return false;
}
}
true
}
/// Calculate support for itemsets.
fn calculate_support(&self, itemsets: &[HashSet<String>], transactions: &[HashSet<String>]) -> HashMap<HashSet<String>, f64> {
let mut support_counts = HashMap::new();
for itemset in itemsets {
let count = transactions
.iter()
.filter(|transaction| itemset.is_subset(transaction))
.count();
let support = count as f64 / transactions.len() as f64;
support_counts.insert(itemset.clone(), support);
}
support_counts
}
/// Find all frequent itemsets using Apriori algorithm.
pub fn find_frequent_itemsets(&mut self, transactions: &[Vec<String>]) -> &HashMap<usize, HashMap<HashSet<String>, f64>> {
let transaction_sets: Vec<HashSet<String>> = transactions
.iter()
.map(|t| t.iter().cloned().collect())
.collect();
self.frequent_itemsets.clear();
// Generate 1-itemsets
let mut all_items = HashSet::new();
for transaction in &transaction_sets {
for item in transaction {
let mut itemset = HashSet::new();
itemset.insert(item.clone());
all_items.insert(itemset);
}
}
let candidates: Vec<HashSet<String>> = all_items.into_iter().collect();
let support_map = self.calculate_support(&candidates, &transaction_sets);
// Filter frequent 1-itemsets
let frequent_1: HashMap<HashSet<String>, f64> = support_map
.into_iter()
.filter(|(_, support)| *support >= self.min_support)
.collect();
if !frequent_1.is_empty() {
self.frequent_itemsets.insert(1, frequent_1);
}
// Generate k-itemsets for k > 1
let mut k = 2;
while let Some(frequent_k_minus_1) = self.frequent_itemsets.get(&(k - 1)) {
let frequent_k_minus_1_sets: Vec<HashSet<String>> = frequent_k_minus_1.keys().cloned().collect();
let candidates = self.generate_candidates(&frequent_k_minus_1_sets);
if candidates.is_empty() {
break;
}
let support_map = self.calculate_support(&candidates, &transaction_sets);
let frequent_k: HashMap<HashSet<String>, f64> = support_map
.into_iter()
.filter(|(_, support)| *support >= self.min_support)
.collect();
if frequent_k.is_empty() {
break;
}
self.frequent_itemsets.insert(k, frequent_k);
k += 1;
}
&self.frequent_itemsets
}
/// Generate association rules from frequent itemsets.
pub fn generate_rules(&mut self, transactions: &[Vec<String>]) -> &Vec<AssociationRule> {
let metrics = AssociationRuleMetrics::new(transactions.to_vec());
self.rules.clear();
// Generate rules from itemsets of size 2 and above
for k in 2..=self.frequent_itemsets.len() {
if let Some(frequent_k) = self.frequent_itemsets.get(&k) {
for itemset in frequent_k.keys() {
// Generate all possible antecedent-consequent splits
let items: Vec<String> = itemset.iter().cloned().collect();
// Generate all non-empty proper subsets as antecedents
for i in 1..(1 << items.len()) - 1 {
let mut antecedent = HashSet::new();
let mut consequent = HashSet::new();
for (j, item) in items.iter().enumerate() {
if (i >> j) & 1 == 1 {
antecedent.insert(item.clone());
} else {
consequent.insert(item.clone());
}
}
let rule = AssociationRule::new(antecedent, consequent, &metrics);
if rule.confidence >= self.min_confidence {
self.rules.push(rule);
}
}
}
}
}
// Sort rules by confidence * lift
self.rules.sort_by(|a, b| (b.confidence * b.lift).partial_cmp(&(a.confidence * a.lift)).unwrap());
&self.rules
}
}
FP-Growth Algorithm
FP-Growth is more efficient than Apriori, avoiding candidate generation by using a compact tree structure.
/// Node in FP-Tree structure.
#[derive(Debug, Clone)]
pub struct FPNode {
/// Item name
item: String,
/// Support count
count: usize,
/// Parent node
parent: Option<Box<FPNode>>,
/// Child nodes
children: HashMap<String, Box<FPNode>>,
/// Link to next node with same item
next_link: Option<Box<FPNode>>,
}
impl FPNode {
/// Create new FP-Tree node.
pub fn new(item: String, count: usize) -> Self {
Self {
item,
count,
parent: None,
children: HashMap::new(),
next_link: None,
}
}
/// Increment node count.
pub fn increment(&mut self, count: usize) {
self.count += count;
}
}
/// FP-Tree implementation for efficient pattern mining.
#[derive(Debug)]
pub struct FPTree {
/// Root node
root: FPNode,
/// Header table mapping items to first occurrence
header_table: HashMap<String, Box<FPNode>>,
/// Minimum support threshold
min_support: f64,
/// Item frequencies
item_frequencies: HashMap<String, usize>,
}
impl FPTree {
/// Create new FP-Tree.
pub fn new(min_support: f64) -> Self {
Self {
root: FPNode::new("null".to_string(), 0),
header_table: HashMap::new(),
min_support,
item_frequencies: HashMap::new(),
}
}
/// Build FP-Tree from transactions.
pub fn build(&mut self, transactions: &[Vec<String>]) {
// First pass: count item frequencies
let mut item_counts = HashMap::new();
for transaction in transactions {
for item in transaction {
*item_counts.entry(item.clone()).or_insert(0) += 1;
}
}
// Filter frequent items
let min_count = (self.min_support * transactions.len() as f64) as usize;
self.item_frequencies = item_counts
.into_iter()
.filter(|(_, count)| *count >= min_count)
.collect();
// Second pass: build tree
for transaction in transactions {
// Filter and sort items by frequency (descending)
let mut frequent_items: Vec<String> = transaction
.iter()
.filter(|item| self.item_frequencies.contains_key(*item))
.cloned()
.collect();
frequent_items.sort_by(|a, b| {
self.item_frequencies[b].cmp(&self.item_frequencies[a])
});
if !frequent_items.is_empty() {
self.insert_transaction(&frequent_items, 1);
}
}
}
/// Insert transaction into FP-Tree.
fn insert_transaction(&mut self, items: &[String], count: usize) {
let mut current = &mut self.root;
for item in items {
if current.children.contains_key(item) {
current.children.get_mut(item).unwrap().increment(count);
} else {
let new_node = Box::new(FPNode::new(item.clone(), count));
current.children.insert(item.clone(), new_node);
// Update header table if this is the first occurrence
if !self.header_table.contains_key(item) {
self.header_table.insert(item.clone(), current.children[item].clone());
}
}
// This is simplified - in a full implementation, you'd need proper tree traversal
}
}
}
/// FP-Growth algorithm implementation.
pub struct FPGrowth {
/// Minimum support threshold
min_support: f64,
/// Found frequent patterns
frequent_patterns: Vec<(HashSet<String>, f64)>,
}
impl FPGrowth {
/// Create new FP-Growth instance.
pub fn new(min_support: f64) -> Self {
Self {
min_support,
frequent_patterns: Vec::new(),
}
}
/// Mine frequent patterns using FP-Growth algorithm.
pub fn mine_patterns(&mut self, transactions: &[Vec<String>]) -> &Vec<(HashSet<String>, f64)> {
self.frequent_patterns.clear();
// Build FP-Tree
let mut fp_tree = FPTree::new(self.min_support);
fp_tree.build(transactions);
// Mine patterns from tree (simplified implementation)
for (item, frequency) in &fp_tree.item_frequencies {
let support = *frequency as f64 / transactions.len() as f64;
if support >= self.min_support {
let mut itemset = HashSet::new();
itemset.insert(item.clone());
self.frequent_patterns.push((itemset, support));
}
}
&self.frequent_patterns
}
}
Practical Applications
Market Basket Analysis Example
use crate::machine_learning::unsupervised::association_rules::{AssociationRuleMetrics, AprioriAlgorithm};
/// Comprehensive market basket analysis implementation.
pub fn analyze_market_basket() {
// Sample market basket transactions
let transactions = vec![
vec!["bread".to_string(), "milk".to_string(), "eggs".to_string()],
vec!["bread".to_string(), "butter".to_string(), "milk".to_string()],
vec!["milk".to_string(), "eggs".to_string(), "cheese".to_string()],
vec!["bread".to_string(), "milk".to_string(), "cheese".to_string()],
vec!["butter".to_string(), "eggs".to_string(), "cheese".to_string()],
vec!["bread".to_string(), "butter".to_string(), "eggs".to_string()],
vec!["milk".to_string(), "cheese".to_string()],
vec!["bread".to_string(), "milk".to_string(), "butter".to_string(), "eggs".to_string()],
vec!["cheese".to_string(), "eggs".to_string()],
vec!["bread".to_string(), "milk".to_string()],
];
// Calculate basic metrics
let metrics = AssociationRuleMetrics::new(transactions.clone());
// Create itemsets for analysis
let mut bread_set = std::collections::HashSet::new();
bread_set.insert("bread".to_string());
let mut milk_set = std::collections::HashSet::new();
milk_set.insert("milk".to_string());
let mut bread_milk_set = std::collections::HashSet::new();
bread_milk_set.insert("bread".to_string());
bread_milk_set.insert("milk".to_string());
println!("Market Basket Analysis Results:");
println!("Support(bread): {:.3}", metrics.support(&bread_set));
println!("Support(milk): {:.3}", metrics.support(&milk_set));
println!("Support(bread, milk): {:.3}", metrics.support(&bread_milk_set));
println!("Confidence(bread → milk): {:.3}", metrics.confidence(&bread_set, &milk_set));
println!("Lift(bread → milk): {:.3}", metrics.lift(&bread_set, &milk_set));
println!("Conviction(bread → milk): {:.3}", metrics.conviction(&bread_set, &milk_set));
// Apply Apriori algorithm
let mut apriori = AprioriAlgorithm::new(0.2, 0.5);
let frequent_itemsets = apriori.find_frequent_itemsets(&transactions);
println!("\nFrequent Itemsets:");
for (k, itemsets) in frequent_itemsets {
println!("\nSize {}:", k);
for (itemset, support) in itemsets {
let items: Vec<String> = itemset.iter().cloned().collect();
println!(" {{{}}}: {:.3}", items.join(", "), support);
}
}
// Generate association rules
let rules = apriori.generate_rules(&transactions);
println!("\nTop Association Rules:");
println!("{:<20} {:<15} {:<8} {:<10} {:<8}",
"Antecedent", "Consequent", "Support", "Confidence", "Lift");
println!("{}", "-".repeat(70));
for rule in rules.iter().take(10) {
let antecedent: Vec<String> = rule.antecedent.iter().cloned().collect();
let consequent: Vec<String> = rule.consequent.iter().cloned().collect();
println!("{:<20} → {:<13} {:<8.3} {:<10.3} {:<8.3}",
antecedent.join(", "),
consequent.join(", "),
rule.support,
rule.confidence,
rule.lift);
}
}
Recommendation System
/// Association rules-based recommendation system.
pub struct AssociationRecommender {
/// Trained association rules
rules: Vec<AssociationRule>,
/// Item popularity for fallback recommendations
item_popularity: HashMap<String, f64>,
}
impl AssociationRecommender {
/// Create new recommendation system.
pub fn new() -> Self {
Self {
rules: Vec::new(),
item_popularity: HashMap::new(),
}
}
/// Train recommender on transaction data.
pub fn train(&mut self, transactions: &[Vec<String>], min_support: f64, min_confidence: f64) {
let mut apriori = AprioriAlgorithm::new(min_support, min_confidence);
apriori.find_frequent_itemsets(transactions);
self.rules = apriori.generate_rules(transactions).clone();
// Calculate item popularity for fallback
let mut item_counts = HashMap::new();
for transaction in transactions {
for item in transaction {
*item_counts.entry(item.clone()).or_insert(0) += 1;
}
}
let total_transactions = transactions.len() as f64;
for (item, count) in item_counts {
self.item_popularity.insert(item, count as f64 / total_transactions);
}
}
/// Generate recommendations for a given basket.
pub fn recommend(&self, basket: &[String], n_recommendations: usize) -> Vec<String> {
let basket_set: HashSet<String> = basket.iter().cloned().collect();
let mut recommendations: HashMap<String, f64> = HashMap::new();
// Find applicable rules
for rule in &self.rules {
if rule.antecedent.is_subset(&basket_set) &&
!rule.consequent.is_subset(&basket_set) {
// Score based on confidence and lift
let score = rule.confidence * rule.lift;
for item in &rule.consequent {
if !basket_set.contains(item) {
let current_score = recommendations.get(item).unwrap_or(&0.0);
recommendations.insert(item.clone(), current_score.max(score));
}
}
}
}
// Sort recommendations by score
let mut sorted_recommendations: Vec<(String, f64)> = recommendations
.into_iter()
.collect();
sorted_recommendations.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
// Fallback to popularity-based recommendations if needed
if sorted_recommendations.len() < n_recommendations {
let mut popular_items: Vec<(String, f64)> = self.item_popularity
.iter()
.filter(|(item, _)| !basket_set.contains(*item))
.map(|(item, popularity)| (item.clone(), *popularity))
.collect();
popular_items.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
for (item, _) in popular_items {
if !sorted_recommendations.iter().any(|(rec_item, _)| rec_item == &item) {
sorted_recommendations.push((item, 0.0));
}
if sorted_recommendations.len() >= n_recommendations {
break;
}
}
}
sorted_recommendations
.into_iter()
.take(n_recommendations)
.map(|(item, _)| item)
.collect()
}
}
/// Example usage of recommendation system.
pub fn recommendation_example() {
let transactions = vec![
vec!["bread".to_string(), "milk".to_string(), "eggs".to_string(), "butter".to_string()],
vec!["bread".to_string(), "milk".to_string(), "cheese".to_string()],
vec!["milk".to_string(), "eggs".to_string(), "yogurt".to_string()],
vec!["bread".to_string(), "butter".to_string(), "jam".to_string()],
vec!["milk".to_string(), "cereal".to_string(), "banana".to_string()],
vec!["eggs".to_string(), "bacon".to_string(), "bread".to_string()],
vec!["cheese".to_string(), "wine".to_string(), "crackers".to_string()],
vec!["milk".to_string(), "cookies".to_string(), "chocolate".to_string()],
];
let mut recommender = AssociationRecommender::new();
recommender.train(&transactions, 0.1, 0.3);
let test_baskets = vec![
vec!["milk".to_string(), "bread".to_string()],
vec!["cheese".to_string(), "wine".to_string()],
vec!["eggs".to_string(), "bacon".to_string()],
];
println!("Recommendation Examples:");
for (i, basket) in test_baskets.iter().enumerate() {
let recommendations = recommender.recommend(basket, 3);
println!("Basket {}: {:?} → Recommendations: {:?}",
i + 1, basket, recommendations);
}
}
Association rules provide powerful insights into item relationships and enable effective recommendation systems. The Rust implementation offers memory safety and performance benefits while maintaining the mathematical rigor of traditional association rule mining algorithms.