Association Rule Mining: Finding Patterns in Data
Have you ever wondered how online stores know exactly what to
recommend when you add an item to your cart? Or how supermarkets strategically
place certain products next to each other? The
answer lies in Association Rule Mining , a crucial data mining technique developed by Agrawal et
al. in 1993.
Association Rule Mining is
an important data mining model that has been extensively studied in the
database and data mining community.
The model initially gained
prominence for Market Basket Analysis to discover how items
purchased by customers are related. It generally assumes all data
is categorical (not numeric). The core goal of association rule mining is to find all rules
that satisfy a user-specified minimum support (minsup) and minimum
confidence (minconf). The core idea is
simple: if customers often buy X, what else are they likely to buy, Y?
Key
Concepts
· Items (I): The
set of all unique articles or products sold.
· Transaction (t): A
set of items purchased together in one instance (e.g., one shopping basket).
· Transaction Database (T): The entire dataset, which is a collection of all transactions.
· Association Rule: An
implication of the form X→Y, where X and Y are
itemsets and have no items in common (X∩Y=∅). X is the antecedent
(condition), and Y is the consequent
(result).
At its core, the model
deals with a set of items I={i1,i2,…,im} and
a transaction database T, which is a set of
transactions T={t1,t2,…,tn}. Each transaction t is
a set of items, and t⊆I.
I={Beef, chicken, milk, cheese, boots, clothes}
|
TID |
Items Purchased |
|
|
t1 |
Beef,
Chicken, Milk |
|
|
t2 |
Beef,
Cheese |
|
|
t3 |
Cheese,
Boots |
|
|
t4 |
Beef,
Chicken, Cheese |
|
|
t5 |
Beef,
Chicken, Clothes, Cheese, Milk |
|
|
t6 |
Chicken,
Clothes, Milk |
|
|
t7 |
Chicken,
Milk, Clothes |
|
An association rule is an implication of the
form:
X→Y
where X and Y are
sets of items called itemsets (X,Y⊂I)
and X and Y have
no items in common (X∩Y=∅). The rule states that when itemset X occurs,
itemset Y occurs
with a certain probability.
The strength of a rule is measured using two key metrics:
1. Support (sup): Measures how
frequently the itemsets X and Y appear together in
the transaction database T. A rule holds with support sup in T if sup%
of transactions contain both X and Y (i.e., X∪Y).
support=Pr(X∪Y)=(X∪Y).count/n
where n is
the total number of transactions.
2. Confidence ( conf): Measures the
probability that Y is present given
that X is already
present. A rule holds with confidence conf if conf%
of transactions that contain X also contain Y.
confidence=Pr(Y∣X)= (X∪Y).count/ X.count