Basic Concept of Market Basket Analysis

Association Rule Mining: Finding Patterns in Data

Have you ever wondered how online stores know exactly what to recommend when you add an item to your cart? Or how supermarkets strategically place certain products next to each other? The answer lies in Association Rule Mining , a crucial data mining technique developed by Agrawal et al. in 1993.

Association Rule Mining is an important data mining model that has been extensively studied in the database and data mining community.

The model initially gained prominence for Market Basket Analysis to discover how items purchased by customers are related. It generally assumes all data is categorical (not numeric). The core goal of association rule mining is to find all rules that satisfy a user-specified minimum support (minsup) and minimum confidence (minconf). The core idea is simple: if customers often buy X, what else are they likely to buy, Y?

Key Concepts

· Items (I): The set of all unique articles or products sold.

· Transaction (t): A set of items purchased together in one instance (e.g., one shopping basket).

· Transaction Database (T): The entire dataset, which is a collection of all transactions.

· Association Rule: An implication of the form X→Y, where X and Y are itemsets and have no items in common (X∩Y=∅). X is the antecedent (condition), and Y is the consequent (result).

At its core, the model deals with a set of items I={i1,i2,…,im} and a transaction database T, which is a set of transactions T={t1,t2,…,tn}. Each transaction t is a set of items, and t⊆I.

I={Beef, chicken, milk, cheese, boots, clothes}

TID	Items Purchased
t1	Beef, Chicken, Milk
t2	Beef, Cheese
t3	Cheese, Boots
t4	Beef, Chicken, Cheese
t5	Beef, Chicken, Clothes, Cheese, Milk
t6	Chicken, Clothes, Milk
t7	Chicken, Milk, Clothes

An association rule is an implication of the form:

X→Y

where X and Y are sets of items called itemsets (X,Y⊂I) and X and Y have no items in common (X∩Y=∅). The rule states that when itemset X occurs, itemset Y occurs with a certain probability.

The strength of a rule is measured using two key metrics:

1. Support (sup): Measures how frequently the itemsets X and Y appear together in the transaction database T. A rule holds with support sup in T if sup% of transactions contain both X and Y (i.e., X∪Y).

support=Pr(X∪Y)=(X∪Y).count/n

where n is the total number of transactions.

2. Confidence ( conf): Measures the probability that Y is present given that X is already present. A rule holds with confidence conf if conf% of transactions that contain X also contain Y.

confidence=Pr(Y∣X)= (X∪Y).count/ X.count

Dr Umesh Kumar Pandey

Search This Blog

Basic Concept of Market Basket Analysis

Key Concepts