Analytics
Predictive LTV Models for DTC Brands
Predictive customer lifetime value models for DTC: probabilistic BTYD, gradient boosted, and simpler cohort projection depending on scale and use case.
What you get
Deliverables, not deliverable-ish.
Scoped plan
Written scope with success criteria, not a vague retainer.
Senior execution
The person scoping the work is the person doing the work.
Measurable output
Deliverables you can point at. Dashboards, flows, code, docs.
Clean handoff
Documentation and training so the work lives inside your team.
How we work
Our approach.
The problem LTV modeling solves
You know what your customers were worth two years ago. You can see it in the cohort analysis. What you need to know is what the cohort you acquired last month will be worth, because that is the cohort you are making decisions about right now. Paid media targeting wants it to seed value-based lookalikes. Email segmentation wants it to prioritize VIPs before they cross the threshold. Finance wants it to forecast contribution margin six quarters out. Every team needs a forward-looking number and the historical cohort view does not provide one.
The common workaround is a simple spreadsheet calculation: average order value times expected order count times contribution margin rate. This breaks for three reasons. Average order count is an average of a long-tail distribution where the mean tells you little about any individual customer. Contribution margin rate changes with product mix and discount depth in ways the spreadsheet does not model. And the whole calculation assumes stationarity, which is to say it assumes next year's customers will behave like last year's, which they almost never do when channel mix and product mix are both shifting.
Proper LTV modeling is a probabilistic exercise, not an arithmetic one. Which approach fits depends on scale and data richness. A brand with eighteen months of history and ten thousand customers needs cohort projection. A brand with three years of history and one hundred thousand customers can support a BTYD model. A brand above thirty million in revenue with rich feature data can train a gradient boosted model. Choosing the wrong approach either overfits or under-resolves, and both produce numbers the business should not trust.
Our approach
We run an eight week LTV modeling engagement.
Step one is scope and approach selection. We audit the data, assess cohort maturity, count distinct customers, and pick an approach. Cohort projection, probabilistic BTYD, or gradient boosted. We document the choice and its tradeoffs. We do not deploy models the data cannot support.
Step two is the data preparation. We require the cohort table from our cohort analysis engagement or we build one as part of this scope. We assemble training features: first order attributes, acquisition channel, first product, first discount, early behavioral signals like email engagement in days zero through thirty, and contribution margin rate per order. We document every feature.
Step three is model build and validation. For cohort projection we fit decay curves per cohort dimension and extrapolate. For BTYD we fit BG/NBD for order frequency and Gamma-Gamma for monetary value using the lifetimes library in Python. For gradient boosted we train LightGBM or XGBoost on twelve month contribution margin as the target, with temporal cross-validation so we are not leaking future into past. We report model error in concrete terms.
Step four is operational pipeline. The model is only useful if the predicted LTV lands where decisions get made. We push scores to Klaviyo as a custom property, to Meta as a value-based lookalike seed via server-side CAPI, to Shopify as a customer metafield, and to the warehouse for ongoing analysis. The pipeline is scheduled daily for behavioral signal updates and is documented end to end.
Step five is handoff and retraining. We document the retraining procedure, automate it where feasible with scheduled dbt plus Python jobs, and train the client team on how to interpret the scores. We hand over the code in the client's repo so the pipeline survives our departure.
What you get
▸ Approach selection document with rationale for the chosen modeling method. ▸ Feature table in the warehouse with every input feature documented. ▸ Trained LTV model in Python or dbt, version controlled in your repo. ▸ Model validation report with error bounds on held-out data. ▸ Per-customer predicted twelve month contribution margin LTV, refreshed daily. ▸ Klaviyo integration pushing LTV scores as a custom property. ▸ Meta CAPI integration pushing LTV as a value-based lookalike seed. ▸ Shopify metafield integration for merchandising and customer service use. ▸ Warehouse table of daily predictions for ongoing analysis. ▸ Retraining automation scheduled and documented. ▸ Training session for marketing, merchandising, and finance teams. ▸ Quarterly review template for model drift monitoring.
Timeline
Eight weeks in four phases.
Weeks one and two are scope and data preparation. We pick the approach, audit and assemble the features, and validate data quality.
Weeks three through five are model build and validation. We train the chosen model, evaluate on held-out data, and iterate until error bounds are acceptable.
Weeks six and seven are the operational pipeline build. We wire Klaviyo, Meta CAPI, Shopify metafields, and the warehouse layer. We schedule retraining.
Week eight is handoff and training. We deliver the validation report, train each stakeholder group, and transition ownership.
Mini case anatomy
A pet supplies brand in the forty to sixty million revenue range had a spreadsheet LTV calculation that put twelve month LTV at roughly one hundred ninety dollars. Paid media was using that number to set CAC ceilings. Finance was using it to forecast contribution margin. We ran the full engagement and trained a gradient boosted model on three and a half years of cohort data.
The model revealed that the spreadsheet average masked a long tail. Subscription customers had a predicted twelve month LTV above three hundred twenty dollars. One-time buyers of the starter product had a predicted twelve month LTV below ninety dollars. The spreadsheet average was the mean of a bimodal distribution and no individual customer was close to it. The paid team had been under-bidding on subscription-intent audiences and over-bidding on broad prospecting that skewed one-time.
Six months after the model landed, blended paid CAC had dropped seventeen percent on flat spend because the Meta value-based lookalikes were seeding on predicted LTV rather than first order value. Subscription penetration in new customer orders rose from eleven to seventeen percent because Klaviyo flows were triggering on predicted LTV score. For the underlying logic see our post on ecommerce customer lifetime value.
FAQs
See frequently asked questions below. LTV modeling builds on cohort analysis and benefits from clean server-side tagging for the Meta CAPI integration. For the broader picture see our analytics and reporting hub and our guides on attribution for DTC using MER and break-even ROAS.
FAQ
Questions we hear most.
Other ecommerce analytics and reporting services services
Let's see if we're a fit.
15 minutes. We'll tell you whether this service fits where you are. If not, we'll name what does.
Book a 15-min call