Apriori Algorithm

 The Apriori Algorithm and its Core Property

The Apriori Algorithm is one of the best-known algorithms for association rule mining. It works in two main steps: 

Pass1: Find all       frequent itemsets (itemsets that satisfy minsup).

Pass2: Use the frequent itemsets to          generate association rules (which satisfy minconf).

The Apriori Property (Downward Closure)

The key to the Apriori algorithm's efficiency is the Apriori property (or downward closure property): Any subset of a frequent itemset must also be a frequent itemset.

This property allows the algorithm to perform an iterative, level-wise searchIt starts by finding all 1-item frequent itemsets (F1), then uses F1 to generate and test candidates for F2, and so on.

The logical consequence of the Apriori property is that if an itemset is infrequent (its support is less than minsup), then all of its supersets must also be infrequentThis is the basis for the prune step in candidate generation, which significantly reduces the number of itemsets that need to be counted.

Simplified Visual of the Downward Closure Property

This structure illustrates the relationship between frequent itemsets based on the Apriori property:

Itemset Level

Itemsets

Frequent/Infrequent

Downward Closure Principle

3-Itemset

{A, B, C}

 Frequent

All subsets must also be frequent.

2-Itemsets

{A, B},{A, C},{B, C}

 Frequent

1-Itemsets

{A},{B},{C}

 Frequent

3-Itemset

{X, Y, Z}

 Infrequent

No need to check any of its supersets (like {X, Y, Z, W}), and at least one of its subsets must be infrequent.

In the Apriori algorithm, if a subset of a candidate itemset of size k is not in Fk−1 (the set of frequent itemsets of size k−1), then the candidate itemset is removed in the prune step.

Steps of Apriori Algorithm

The Apriori Algorithm is the classical, seminal algorithm for Association Rule Mining. It works in two main, distinct steps: Finding Frequent Itemsets and Generating Association Rules.

The algorithm's efficiency relies on the Apriori Property (or Downward Closure Property): Any subset of a frequent itemset must also be frequent.

Step 1: Finding All Frequent Itemsets

This phase uses a level-wise search to find all itemsets whose support is greater than or equal to the user-specified Minimum Support (minsup).

Step

Action

Explanation

1.1: Initialization

Find all frequent 1-itemsets (F_1).

Scan the database to count the support of every single item. Discard any item with a count below minsup. The remaining items form F_1.

1.2: Iteration (k=2, 3,……)

Repeat the cycle of Candidate Generation and Pruning until no new frequent itemsets are found.

1.3: Candidate Generation (C_k)

Generate candidate k-itemsets (C_k) from F_k-1.

Perform a join operation: Combine any two frequent itemsets from F_k-1 that share the first k-2 items to form a k-item candidate.

1.4: Pruning (C_k to C'_k)

Prune the candidate set C_k to create C'_k.

Apply the Apriori Property: If any (k-1) -sized subset of a candidate k-itemset in C_k is not found in F_k-1, then that candidate cannot be frequent and is removed.

1.5: Support Counting (F_k)

Find the frequent k-itemsets F_k.

Scan the transaction database (T) to count the actual support for the remaining candidates in C'_k. All candidates whose count meets minsup are added to F_k.

1.6: Termination

Stop when the set F_k is empty.

The union of all F_k found F = F_1ÈF_2 È…..) is the complete set of frequent itemsets.


Step 2: Generating Association Rules

In this phase, all frequent itemsets (F) found in Step 1 are used to generate high-confidence rules that satisfy the user-specified Minimum Confidence (minconf).

  1. Iterate Over Frequent Itemsets: For every frequent itemset X Î F (where |X|>= 2), generate all possible non-empty subsets of X.
  2. Generate Rules: For every non-empty subset A ÌX, form an association rule R:

A®(X - A)

The set A is the antecedent, and B = (X - A) is the consequent.

  1. Check Confidence: Calculate the confidence for the rule R using the support counts recorded in Step 1 (no need to rescan the database):

{Confidence}(A®B) = Support(AÈB)/Support (A)} = Support}(X)/Support}(A)

  1. Output Rules: If Confidence}(A®B) >=minconf, the rule is considered strong and is output.

This process ensures that every generated rule is both frequent (because it is derived from a frequent itemset X and reliable (because it meets the minconf threshold).