Hierarchical Clustering

Market Segmentation Case Study — Hierarchical Clustering for Targeted Marketing

Market Segmentation — Hierarchical Clustering Case Study (Targeted Marketing)

This case study demonstrates a step-by-step, business-focused approach to market segmentation using Hierarchical Clustering. It includes a small sample dataset for manual calculation, the math behind distance & linkage, a reproducible Python pipeline to produce the dendrogram and cluster labels, an example output table, visualization and practical interpretation for targeted marketing. All paragraphs are set in Times New Roman, 12pt, with justified alignment so you can paste directly into BlogSpot's HTML editor.


1. Business Problem

A retail marketer wants to group customers into meaningful segments so marketing resources (promotions, loyalty offers, personalized campaigns) can be targeted efficiently. The objective is to find hierarchical groups—so business stakeholders can view segmentation at multiple granularities (e.g., 2 groups for high-level strategy, 4–6 groups for operational targeting).

2. Why Hierarchical Clustering?

Hierarchical (agglomerative) clustering builds a tree (dendrogram) of nested clusters. Advantages for business use include:

  • Visualization of cluster relationships (dendrogram) helps stakeholders choose the level of granularity.
  • No need to pre-specify the number of clusters (you cut the tree later).
  • Works well with small-to-medium datasets and mixed linkage choices (single, complete, average, Ward) to reflect business similarity definitions.

3. Sample Dataset (toy) — Customers (for manual calculation)

We use a very small dataset of five customers with two numeric features for clarity: Recency (days since last purchase) and Monetary (average monthly spend in USD). Save as customers_manual.csv if you want to follow the manual steps.

CustomerIDRecency (days)MonthlySpend (USD)
C110400
C220300
C320050
C418080
C515380

This data intentionally has two natural groups: recent high spenders (C1, C2, C5) and long-ago low spenders (C3, C4).

4. Distance Metric & Linkage — Equations

Euclidean distance (2D):
d(x, y) = √((x₁ − y₁)² + (x₂ − y₂)²)
Single linkage (nearest):
d(A,B) = min { d(a,b) : a ∈ A, b ∈ B }
Complete linkage (farthest):
d(A,B) = max { d(a,b) : a ∈ A, b ∈ B }
Average linkage:
d(A,B) = (1 / |A||B|) ∑a∈Ab∈B d(a,b)
Ward's method (minimize within-cluster variance):
Δ(A,B) = ((|A||B|) / (|A|+|B|)) * ||μ_A − μ_B||² where μ are cluster centroids.

Ward's method is commonly used when clusters should minimize variance (producing compact, spherical clusters). We will illustrate manual agglomeration with Euclidean distances and average linkage for clarity, but the Python code below uses Ward and dendrograms so you can experiment.

5. Manual Step-by-Step Hierarchical (Agglomerative) Clustering — Average Linkage

Step 0 — compute pairwise Euclidean distances between all 5 customers (rounded to 2 decimals):

Pairwise distances (rows/cols: C1 C2 C3 C4 C5)

d(C1,C2) = sqrt((10-20)^2 + (400-300)^2) = sqrt(100 + 10000) = sqrt(10100) ≈ 100.50
d(C1,C3) = sqrt((10-200)^2 + (400-50)^2) = sqrt(36100 + 122500) = sqrt(158600) ≈ 398.25
d(C1,C4) ≈ sqrt((10-180)^2 + (400-80)^2) = sqrt(28900 + 102400) = sqrt(131300) ≈ 362.45
d(C1,C5) = sqrt((10-15)^2 + (400-380)^2) = sqrt(25 + 400) = sqrt(425) ≈ 20.62

d(C2,C3) ≈ 320.78
d(C2,C4) ≈ 284.60
d(C2,C5) = sqrt((20-15)^2 + (300-380)^2) = sqrt(25 + 6400) = sqrt(6425) ≈ 80.16

d(C3,C4) = sqrt((200-180)^2 + (50-80)^2) = sqrt(400 + 900) = sqrt(1300) ≈ 36.06
d(C3,C5) ≈ 389.80

d(C4,C5) ≈ 353.23

Step 1 — find the smallest distance: d(C1,C5)=20.62 → merge C1 & C5 into cluster A = {C1,C5}.

Step 2 — update distances between new cluster A and remaining points using average linkage:

d(A, C2) = (d(C1,C2) + d(C5,C2)) / 2 = (100.50 + 80.16) / 2 = 90.33
d(A, C3) = (398.25 + 389.80) / 2 = 394.03
d(A, C4) = (362.45 + 353.23) / 2 = 357.84

Step 3 — smallest remaining distance: d(C3, C4)=36.06 → merge B = {C3, C4}.

Step 4 — update distances between clusters A and B and remaining point C2:

d(A, B) = average of distances between members:
  = (d(C1,C3) + d(C1,C4) + d(C5,C3) + d(C5,C4)) / 4
  = (398.25 + 362.45 + 389.80 + 353.23) / 4 ≈ 375.93

d(B, C2) = (d(C3,C2) + d(C4,C2)) / 2 = (320.78 + 284.60) /2 ≈ 302.69
d(A, C2) from before = 90.33

Step 5 — smallest distance now is d(A, C2)=90.33 → merge into cluster C = {C1,C5,C2}.

Final merge: merge C and B at distance ≈ 375.93. The dendrogram ordering (low→high) shows first merge C1+C5, then C3+C4, then add C2 to the C1/C5 group, then final merge.

This manual walk-through illustrates how hierarchical clustering forms nested groups. In practice you compute distances programmatically and plot a dendrogram to choose a cut height that yields the desired number of segments.


6. Python Pipeline — Reproducible Code (Dendrogram + Clusters)

Below is a ready-to-run Python snippet. Replace the sample dataset with your CSV if needed. It standardizes features, computes the Ward linkage dendrogram (often preferred for compact clusters), and extracts cluster labels for a chosen number of clusters.


# Requirements: pandas, numpy, matplotlib, scipy, scikit-learn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import AgglomerativeClustering

# --- Sample dataset (same five customers used above)
data = {
    'CustomerID': ['C1','C2','C3','C4','C5'],
    'Recency': [10, 20, 200, 180, 15],             # days
    'MonthlySpend': [400, 300, 50, 80, 380]        # USD
}
df = pd.DataFrame(data).set_index('CustomerID')

# 1) Standardize features (important for distance-based clustering)
scaler = StandardScaler()
X = scaler.fit_transform(df[['Recency','MonthlySpend']])

# 2) Compute linkage matrix (Ward's method)
Z = linkage(X, method='ward')   # 'ward' minimizes variance; alternatives: 'single','complete','average'

# 3) Plot dendrogram
plt.figure(figsize=(8,4))
dendrogram(Z, labels=df.index.tolist(), leaf_rotation=0)
plt.title('Hierarchical Clustering Dendrogram (Ward)')
plt.ylabel('Distance (Ward)')
plt.tight_layout()
plt.show()

# 4) Choose number of clusters (e.g., 2) and get labels
n_clusters = 2
agg = AgglomerativeClustering(n_clusters=n_clusters, affinity='euclidean', linkage='ward')
labels = agg.fit_predict(X)
df['Cluster'] = labels

# 5) Cluster summary table
summary = df.groupby('Cluster').agg(
    Count=('Recency','count'),
    MeanRecency=('Recency','mean'),
    MeanMonthlySpend=('MonthlySpend','mean')
).reset_index()
print(summary)
print('\\nCluster assignments:')
print(df)

If you run this code you will see the dendrogram and the cluster assignment table printed in the console. You can change n_clusters to 2, 3, or more to inspect different granularities. For larger datasets, compute silhouette scores or use business KPIs to select the right cut.


7. Example Output Table (sample result for n_clusters = 2)

ClusterCountMean Recency (days)Mean MonthlySpend (USD)
0315.0360.0
12190.065.0

Interpretation: Cluster 0 (C1,C2,C5) are recent, high-spend customers — ideal for premium loyalty programs and personalized offers. Cluster 1 (C3,C4) are lapsed/low-spend customers — candidates for win-back campaigns or cost-effective reactivation strategies.

8. Visualization

Key visual outputs you should produce for a report:

  1. Dendrogram — shows hierarchical merges and suggests cut height for clusters.
  2. 2D scatter plot (if working with two features) — color points by assigned cluster and show cluster centroids.

The Python code above produces the dendrogram. For a scatter plot, reverse scale the standardized centers to original scale and plot Recency vs. MonthlySpend colored by cluster.

9. Business Interpretation & Targeted Marketing Actions

Using the clusters produced by hierarchical clustering, business teams can define targeted actions:

  • High-value recent customers (Cluster 0): priority retention — exclusive offers, early access, personalized recommendations, premium service lines.
  • Low-value lapsed customers (Cluster 1): low-cost reactivation — win-back email drip, targeted discounts, reminders of new products.
  • Multi-level targeting: The hierarchical tree allows choosing different granularity—e.g., cut into 4 clusters to separate occasional medium spenders from tiny one-time buyers.
  • Measure impact: Use A/B tests to measure lift from segment-specific campaigns (conversion, CLTV uplift) and refine segmentation accordingly.

10. Practical Considerations & Tips

  1. Feature selection: include RFM (Recency, Frequency, Monetary), product affinity vectors, channel preference, or demographics as appropriate.
  2. Scaling: always standardize numeric features before computing Euclidean distances.
  3. Linkage choice: Ward for compact clusters; complete to avoid chaining; average for balanced behavior. Try several and compare business meaning.
  4. Outliers: can distort hierarchy; consider trimming or robust scaling (e.g., log-transform monetary values) before clustering.
  5. Interpretability: present centroids (or median values) of clusters to stakeholders and give each cluster a descriptive name (e.g., “Recent Premiums”, “Dormant Bargain Shoppers”).
  6. Operationalization: export cluster assignment to the CRM to power targeted campaigns; monitor drift monthly and rebuild segments when customer behavior changes.

11. Final Checklist (to run on your dataset)

  1. Prepare features and handle missing data.
  2. Scale numeric features (StandardScaler).
  3. Compute linkage (try Ward, average, complete) and plot dendrogram.
  4. Choose cut height or number of clusters using business KPIs, silhouette scores, or elbow-like inspections on dendrogram.
  5. Profile clusters, give descriptive labels, design campaign actions for each segment.
  6. Deploy segments to marketing automation and measure lift.

— End of Market Segmentation Case Study —