Basic Concept of Market Basket Analysis

 Association Rule Mining: Finding Patterns in Data

Have you ever wondered how online stores know exactly what to recommend when you add an item to your cart? Or how supermarkets strategically place certain products next to each other? The answer lies in Association Rule Mining , a crucial data mining technique developed by Agrawal et al. in 1993.

Association Rule Mining is an important data mining model that has been extensively studied in the database and data mining community.

The model initially gained prominence for Market Basket Analysis to discover how items purchased by customers are relatedIt generally assumes all data is categorical (not numeric)The core goal of association rule mining is to find all rules that satisfy a user-specified minimum support (minsup) and minimum confidence (minconf). The core idea is simple: if customers often buy X, what else are they likely to buy, Y?

Key Concepts

·             Items (I): The set of all unique articles or products sold.

·            Transaction (t): A set of items purchased together in one instance (e.g., one shopping basket).

·            Transaction Database (T): The entire dataset, which is a collection of all transactions.

·      Association Rule: An implication of the form XY, where X and Y are itemsets and have no items in common (XY=)X is the antecedent (condition), and Y is the consequent (result).

At its core, the model deals with a set of items I={i1,i2,…,im} and a transaction database T, which is a set of transactions T={t1,t2,…,tn}Each transaction t is a set of items, and tI.

I={Beef, chicken, milk, cheese, boots, clothes}

TID

Items Purchased

t1

Beef, Chicken, Milk

t2

Beef, Cheese

t3

Cheese, Boots

t4

Beef, Chicken, Cheese

t5

Beef, Chicken, Clothes, Cheese, Milk

t6

Chicken, Clothes, Milk

t7

Chicken, Milk, Clothes

An association rule is an implication of the form:

XY

where X and Y are sets of items called itemsets (X,YI) and X and Y have no items in common (XY=)The rule states that when itemset X occurs, itemset Y occurs with a certain probability.

The strength of a rule is measured using two key metrics:

1.      Support (sup): Measures how frequently the itemsets X and Y appear together in the transaction database TA rule holds with support sup in T if sup% of transactions contain both X and Y (i.e., XY)

support=Pr(XY)=(XY).count/n

where n is the total number of transactions.

2.            Confidence   (     conf): Measures the probability that Y is present given that X is already present. A rule holds with confidence conf if conf% of transactions that contain X also contain Y

confidence=Pr(YX)= (XY).count/ X.count