AI Customer Segmentation Beyond The Default RFM Buckets
A $5M skincare brand sat across the table from me last quarter with a problem they could not name. Klaviyo flow revenue had been flat for nine months. They had built every flow the certification course recommended.
9 min read · 22 March 2026

AI Customer Segmentation Beyond The Default RFM Buckets
A $5M skincare brand sat across the table from me last quarter with a problem they could not name. Klaviyo flow revenue had been flat for nine months. They had built every flow the certification course recommended. They were running the seven default RFM cohorts. The Champions tier was getting a Champions sequence. The At-Risk tier was getting a winback. The Lost tier was getting a reactivation. Every flow looked correct on paper.
The flows were not the problem. The segments were the problem. The brand was sending the same Champions-tier message to a customer who replenishes their serum every 28 days and a customer who buys once a year for a Christmas gift. Both customers landed in the same RFM bucket because the bucket was scored on recency, frequency, and monetary value across the full catalogue. The 28-day replenisher and the annual gifter had similar RFM scores. They had nothing else in common, and the message that worked for one quietly failed for the other.
A Brand Pumping Out Champions-Tier Emails To The Wrong People
The numbers were embarrassing in retrospect. The brand was running Klaviyo on a $50,000-a-year contract and a 47-flow library that had taken eight months to build. Open rates were healthy. Click rates were fine. Revenue per recipient was middle-of-the-pack for the brand's size, and middle-of-the-pack was the problem because middle-of-the-pack on the Klaviyo defaults masks the fact that two distinct sub-populations are being served the same message.
The Champions tier illustrated the failure cleanly. Klaviyo's default Champions definition uses recency under 30 days, frequency above the brand's median, and monetary value in the top quartile. By that math, a customer who placed an $80 replenishment order three weeks ago is a Champion. A customer who placed an $800 holiday gift bundle three weeks ago is also a Champion. The first customer is on a 28-day replenishment cadence. The second customer will not buy again for 11 months. The "Champions Welcome Back" flow lands in both inboxes and is wrong for both. The first customer was not gone. The second customer is not coming back this quarter regardless of the message.
Klaviyo benchmarks anchors the scale of the gap. Klaviyo's own 2024 Enterprise Ecommerce Benchmarks Report shows email flows generating nearly 41 percent of total email revenue from just 5.3 percent of sends, with automated workflows producing 30 times more revenue per recipient than campaigns. The flow-revenue concentration is enormous, which is exactly why segment precision matters. The gap between an average segment and a precision cluster is the gap between a mid-pack flow stack and a top-quartile flow stack, and the gap is wider than most operators realise.
Klaviyo segmentation guide is Klaviyo's own segmentation playbook and is useful for naming the villain precisely. The default segments are recency-frequency-monetary cohorts, with predefined definitions that work as a baseline. They were never designed to be the final segmentation. Klaviyo RFM how-to lists the predefined segments operators inherit out of the box: Champions, Loyal, Potential Loyalists, New Customers, Promising, Need Attention, About to Sleep, At Risk, Can't Lose Them, Hibernating, Lost. Eleven buckets calibrated to the catalogue average, applied uniformly to a customer base whose actual behaviour is multi-modal.
Klaviyo RFM research is the engineering blog explaining the math behind the RFM scoring. The math is sound for what it does. What it does is normalise behaviour against the brand's distribution and bucket customers into eleven groups defined by relative position. What it does not do is recognise that two customers with identical RFM scores can have completely different replenishment cadences, basket compositions, and category preferences. The math collapses three dimensions into one ranking, and the collapse is where the revenue leaks.
The skincare brand was not under-segmenting. It was mis-segmenting. The defaults were the wrong dimensions for a physical product brand whose customer base clusters on cadence, not on recency.
Why The Math Doesn't Work: The Replenishment Cadence Gap
The contribution-margin math on the skincare brand was the part that finally moved the conversation. Their average order value was $74. Their gross margin was 65 percent. Their email-attributed revenue was 28 percent of total revenue, which sounds healthy until you decompose it by cohort.
The 28-day replenisher cohort had a 60-day repeat rate above 70 percent. The annual gifter cohort had a 60-day repeat rate below 8 percent. Both cohorts were receiving the same Champions Welcome Back flow with a 15 percent off coupon for any order over $60. The replenisher used the coupon every cycle. The brand was discounting customers who would have bought at full price, costing roughly $11 per order on every replenishment they would have triggered without the discount. The gifter ignored the coupon entirely because they were not in-market. The brand was paying for sends that produced zero revenue from one cohort and discounting full-margin orders from the other.
Multiply that across nine months and the brand had bled an estimated $180,000 in margin to a flow stack that looked correct on the dashboard. The flow was not broken. The segment was broken. The flow was correctly serving its assigned segment, and its assigned segment lumped together two populations that needed completely different messages.
The replenishment-cadence dimension was invisible inside RFM. RFM measures when the customer last bought, how often they buy, and how much they spend. RFM does not measure the inter-purchase gap distribution, which is the dimension that separates a 28-day replenisher from an annual gifter. The two customers can have identical recency and frequency scores at any given moment in time. The difference between them is the predicted next-purchase window, and predicted-window is not a feature inside the Klaviyo defaults.
The Behavioural Cluster Engine Blueprint
The replacement is The Behavioural Cluster Engine. The principle is single-sentence simple: replace the recency-frequency-monetary frame with an unsupervised clustering frame on cadence, basket composition, and category affinity, with replenishment-window prediction as the central feature.
The Engine has three feature dimensions. Cadence captures the mean inter-purchase gap and its standard deviation, separating the 28-day replenisher from the 90-day cyclical buyer from the annual gifter. Basket composition captures SKU-set similarity, separating the customer who buys the full routine from the customer who buys only the cleanser. Category affinity captures share-of-wallet by department, separating the skincare-only customer from the skincare-plus-haircare customer.
The clustering algorithm is k-means with k chosen by elbow method. Shopify segmentation guide names k-means and DBSCAN as the standard clustering methods for this work, and k-means wins for most physical product brands because it produces interpretable clusters with stable centroids. K-means CPG paper is the peer-reviewed study showing k-means performs well on purchase-behaviour segmentation in CPG contexts, and the paper supports the Phase 1 technical decisions.
The hard rule on cluster size is 500 customers per cluster minimum. Below 500, the cluster is noise, the email metrics are unstable, and the cluster cannot support a flow stack. Above 500, the cluster is signal and the flow can be tuned to the cluster's actual behaviour. The 500-customer floor is the single most important rule in the Engine, because over-segmenting kills the math faster than under-segmenting.
I have run the Engine on enough Shopify Plus brands now that the cluster count is predictable. Most physical product brands land between five and nine real clusters, regardless of how many RFM buckets they were running before. Below five, the brand has not separated the cadence dimensions properly. Above nine, the brand has over-segmented and the smaller clusters need to be merged back into their nearest neighbour.
Execution: Day 0 To Day 90
The execution rolls in three blocks: data plumbing, clustering, and flow rebuild.
Day 0 to Day 30: data plumbing. Pull Shopify event data for the prior 24 months: order ID, customer ID, product ID, SKU, line value, order date, refund flag. Stitch the order-level data to Klaviyo profile properties: email engagement score, SMS subscribed flag, last open date. The output is a single customer table with one row per customer and engineered features for cadence (mean inter-purchase gap, standard deviation, last-purchase-date offset), basket composition (top 3 SKUs, SKU-set Jaccard similarity to brand average), and category affinity (share of spend by department, top category).
Day 31 to Day 60: clustering. Run k-means on the engineered feature set for k between 4 and 12. Pick k by elbow method on the within-cluster sum-of-squares plot. Validate that every cluster has at least 500 customers; if not, reduce k by one and re-run. Name each cluster by behaviour, not demographics. The skincare brand's seven clusters ended up named Replenishers, Cyclical Loyalists, Routine Builders, Single-Product Browsers, Annual Gifters, Sale-Driven Buyers, and Lapsed Replenishers. The names are operational, not marketing. They tell the email lead what message to send.
Day 61 to Day 90: flow rebuild. Each cluster gets its own flow stack tuned to its behaviour. Replenishers get a cadence-aware reminder triggered at 80 percent of their predicted replenishment window, with no discount. Annual Gifters get an event-driven holiday flow and a quiet rest of the year. Cyclical Loyalists get a category-rotation cadence. The Engine does not require new flow infrastructure. It requires the existing flows to be remapped against the new clusters and the discount logic to be tuned per cluster.
Shopify segments docs covers the Shopify Audiences segments that some brands run alongside Klaviyo. The Audiences segments are the second named villain in this article and the same critique applies: they are pre-built, they are calibrated to the catalogue average, and they do not capture replenishment cadence. The Engine replaces both Klaviyo defaults and Shopify Audiences with brand-specific clusters, and the brand stops paying twice for two systems that produce the same generic buckets.
Smartbug Klaviyo report is the practitioner annotated benchmark report and is useful for sanity-checking the post-rebuild numbers. The report's flow-revenue percentiles tell you whether the Engine is producing top-quartile revenue per recipient. Below the 75th percentile after the rebuild, the clusters are wrong. At or above the 75th percentile, the Engine is doing what the Engine is designed to do.
From Klaviyo Defaults To Cluster-Tuned Flow Revenue
The skincare brand's flow revenue per recipient lifted by 47 percent in the first 60 days after the Engine rebuild. The Replenishers cluster alone accounted for almost half the lift, because the cadence-aware reminder caught customers at the right point in their replenishment window without the 15 percent discount the old flow had been spending. The Annual Gifters cluster contributed another quarter of the lift, by silence: the brand stopped sending nine wasted flows a quarter to a cohort that was not in-market.
The math is what convinces the team. Flow revenue per recipient was the metric the brand had been stuck on for nine months, and 47 percent in 60 days is the kind of move that gets attention. The metric matters because it is the only flow metric that is not gameable. Open rates can be inflated by send timing. Click rates can be inflated by subject line tactics. Revenue per recipient is the cohort's actual contribution against the cohort's actual reach, and it is the metric the Behavioural Cluster Engine is designed to move.
Operators who run the Engine see flow revenue per recipient lift somewhere in the 30 to 60 percent band. The lift is concentrated in the cadence-aware clusters, where the predicted-window feature is doing real work. The lift is smaller in the basket-composition and category-affinity clusters, where the message tuning produces a real but smaller effect. The aggregate across all clusters is what shows up in the brand's monthly email revenue line, and the aggregate is what gets reported to the CFO.
You do not need a more expensive ESP. You need to stop running RFM defaults on a customer base whose behaviour is multi-modal, and you need to cluster on the dimensions that actually predict next-purchase behaviour for your brand. The Behavioural Cluster Engine is the discipline that gets the brand from middle-of-the-pack to top-quartile, and the only thing it requires is taking the segmentation work seriously enough to do it once, properly, on the brand's own data.
Unit Economics Calculator
Contribution margin per order after COGS, shipping and fees — the number scaling actually depends on.
Predictive Lead Scoring That Works For Physical Product Brands
AI Email Marketing Optimization Tuned to Revenue, Not Opens
Automated Customer Journey Mapping That Stays Current
Customer Churn Analysis Template That Actually Works
Marketing Automation Setup: The Six-Rung Revenue Ladder
How to use RFM analysis on shopify to grow your eCommerce
Newsletter
The Uncommon Insights Letter
Practical FMCG & eCommerce growth playbooks — margins, retention and scaling tactics, straight to your inbox.
Turn ai optimization into profit you can see
Get a hands-on operator to turn the frameworks above into results — book a free audit call.