Frequent Pattern Growth

 FP-Growth:

1. FP-Tree Construction (Data Compression)

The algorithm builds a highly compressed data structure called the Frequent Pattern Tree (FP-Tree) by scanning the transaction database twice.

First Pass: Counting Frequencies

1.      Count Item Support: Scan the entire transaction database to count the support for every single item.

2.      Filter Infrequent Items: Discard all items that do not meet the user-specified minimum support (minsup).

3.      Create Header Table: Sort the remaining frequent items in descending order of their support counts. This sorted list, along with pointers to the tree nodes, forms the Header Table.

Second Pass: Building the FP-Tree

1.      Process Transactions: Scan the transaction database again. For each transaction, remove all infrequent items.

2.      Order Items: Sort the items within each transaction according to the global order established in the Header Table (descending support count).

3.      Insert into Tree: Insert the sorted items of each transaction into the FP-Tree.

o    Nodes are created for items. The count of a node is incremented for every transaction that shares that item prefix.

o    If a transaction shares a prefix with an existing path in the tree, only the count of the existing nodes is incremented. New nodes are only created for items beyond the shared prefix.

4.      Update Header Pointers: A link is drawn from the Header Table to the first occurrence of that item in the tree, and subsequent occurrences are linked to each other using node links. This forms a chain connecting all nodes containing the same item.

 

2. FP-Tree Mining (Pattern Growth)

After the FP-Tree is built, the algorithm mines the frequent itemsets using a divide-and-conquer strategy, working bottom-up from the Header Table. The principle is: find all frequent itemsets ending with a specific item i and combine them.

The mining process is recursive:

1.      Iterate Header Table: Start with the least frequent item i in the Header Table. (Working from the least frequent item first improves efficiency.)

2.      Find Conditional Pattern Bases (CPBs): For the current item i, trace all paths in the FP-Tree that end at a node containing i, using the node links in the Header Table.

o    The Conditional Pattern Base (CPB) for item i is the set of prefix paths of i. The count of each path in the CPB is equal to the count of the i node at the end of that path.

3.      Build Conditional FP-Trees (CFPTs):

o    Treat the CPB as a new, smaller transaction database.

o    Sum the counts for each item within the CPB.

o    Filter out items in the CPB that do not meet the minimum support threshold within that base.

o    Construct a new, smaller Conditional FP-Tree (CFPT) based on the remaining items in the filtered CPB.

4.      Generate Frequent Itemsets:

o    Any frequent path in the CFPT combined with the item i forms a frequent itemset.

o    If the CFPT consists of a single path, all combinations of items on that path plus i are frequent.

5.      Recursion: Recursively apply this mining procedure (steps 2-4) to the newly formed Conditional FP-Tree until the tree is empty or contains only a single path.

By recursively projecting the tree onto conditional pattern bases, FP-Growth effectively finds all frequent itemsets without generating and counting explicit candidates, making it significantly faster than Apriori, especially for dense and large datasets.

The FP-Growth algorithm works by transforming the transaction data into a compressed, tree-like structure, the FP-Tree, and then recursively mining that tree. This avoids the repeated database scans and costly candidate generation that plagues Apriori.

Here is the step-by-step working procedure using a sample database.

 

Example Database and Minimum Support

We will use the following transaction database (T) and set a minimum support threshold (minsup).

TID

Items Purchased

T1

A, B, C

T2

B, C, D

T3

A, C, D

T4

A, B, D

T5

A, B, C, D, E

T6

B, E

Minimum Support (minsup): 3 out of 6 transactions (50%)

 

Step 1: FP-Tree Construction (Two Passes)

Pass 1: Count Frequencies and Order Items

1.      Count Support: Calculate the count for every item in T.

o    A: 4, B: 4, C: 3, D: 4, E: 2

2.      Filter Infrequent Items: Item E (count 2) is below minsup (3), so it is discarded.

3.      Create Header Table and Order: Sort the remaining frequent items in descending order of their support counts. Items with equal counts (A, B, D) are ordered alphabetically.

Item

Support

Global Rank (Order)

A

4

1

B

4

2

D

4

3

C

3

4

Export to Sheets

The Canonical Order for items is A,B,D,C.

Pass 2: Build the FP-Tree

Scan T again. For each transaction, remove infrequent items (E) and sort the rest according to the Canonical Order A,B,D,C. Then, insert the sorted transaction into the tree.

TID

Original Transaction

Sorted, Filtered Transaction

Path in FP-Tree

T1

A, B, C

A,B,C

A(1) B(1) C(1)

T2

B, C, D

B,D,C

Root B(1) D(1) C(1)

T3

A, C, D

A,D,C

Root A(2) D(1) C(1)

T4

A, B, D

A,B,D

Root A(3) B(2) D(1)

T5

A, B, C, D, E

A,B,D,C

Root A(4) B(3) D(2) C(1)

T6

B, E

B

Root B(4)

Export to Sheets

 

Step 2: FP-Tree Mining (Pattern Growth)

We mine the tree recursively, starting from the least frequent item in the Header Table: C, D, B, A.

A. Mining Item 'C'

1.      Find Paths: Trace all paths ending in 'C' using the Header Table links:

o    Path 1: A:4,B:3,D:2,C:1 (Count: 1)

o    Path 2: A:3,D:1,C:1 (Count: 1)

o    Path 3: A:1,B:1,C:1 (Count: 1)

o    Path 4: B:4,D:1,C:1 (Count: 1)

2.      Conditional Pattern Base (CPB): The prefix paths (excluding 'C') with their respective counts:

o    A,B,D: 1

o    A,D: 1

o    A,B: 1

o    B,D: 1

3.      Conditional FP-Tree (CFPT): Count support in the CPB:

o    A: 1+1+1=3 (Frequent)

o    B: 1+1+1=3 (Frequent)

o    D: 1+1=2 (Infrequent Discard)

The filtered CPB is: A,B:1, A:1, A,B:1, B:1.

The resulting CFPT for C contains the single path: A:3 B:3

4.      Frequent Itemsets for 'C': Find all combinations from the CFPT, and append 'C'.

o    {C} (Known: Count 3)

o    {A,C}, {B,C}, {A,B,C}

B. Mining Item 'D'

1.      Find Paths: Paths ending in 'D':

o    Path 1: A:4,B:3,D:2 (Count: 2)

o    Path 2: A:3,D:1 (Count: 1)

o    Path 3: B:4,D:1 (Count: 1)

2.      CPB:

o    A,B: 2

o    A: 1

o    B: 1

3.      CFPT: Count support in the CPB:

o    A: 2+1=3 (Frequent)

o    B: 2+1=3 (Frequent)

The resulting CFPT for D contains the path: A:3 B:2, and a separate path B:1.

4.      Frequent Itemsets for 'D':

o    {D} (Known: Count 4)

o    {A,D}, {B,D}, {A,B,D}

C. Mining Item 'B'

1.      Find Paths: Paths ending in 'B' (excluding those already used for C and D):

o    Path 1: A (Count: 1)

o    Path 2: A (Count: 2)

o    Path 3: A (Count: 1)

2.      CPB: A: 4

3.      CFPT: Contains only A:4

4.      Frequent Itemsets for 'B':

o    {B} (Known: Count 4)

o    {A,B}

D. Mining Item 'A'

The paths for 'A' (the highest-ranked item) are only the root, so its frequent itemset is just {A}.

 

Final Result

By combining the itemsets generated in each step, we obtain the complete set of frequent itemsets:

·         1-Itemsets: {A}, {B}, {C}, {D}

·         2-Itemsets: {A,B}, {A,C}, {A,D}, {B,C}, {B,D}

·         3-Itemsets: {A,B,C}, {A,B,D}