Skip to main content
INSIGNIA.
Engage
CASE0001
PG. SUB · retail-forecast
SHIPPEDCASE / 2025.04
/Retail-Group/

Aisle 19

Hierarchical demand forecasting, long-tail SKUs

RETAIL
/ 01
Stakes

A 1,200-store multi-format retailer was carrying about 25% more stock than it needed because the legacy forecast couldn't tell pasta from imported truffle salt. Top-selling SKUs got accurate predictions; everything in the long tail got rounded up to safety stock, just in case. The CFO saw the inventory line; the merchandising team saw stockouts on the same SKUs the warehouse was overstocked on. Category managers had stopped trusting the system years ago and were overriding orders by gut feel, which moved the bias around without removing it. The fix wasn't a better single model, it was a model that respected the shape of the catalog: the fast-movers learn from themselves, the long tail borrows strength from its category and region, and signals that move demand (weather, promo, regional events, macro) get a seat at the table instead of being averaged out.

/ 02
Approach
01

Hierarchical fit at SKU × store × season

Instead of a flat per-SKU model, the forecast learns at three grains at once: SKU, store, season. The fast-movers carry enough history to learn their own seasonality cleanly. The long tail (most of the catalog by SKU count, a small slice of revenue) gets pooled through its category × region group and shrinks toward the group mean when its own signal is too sparse to trust. The shrinkage strength is learned per-group, not set globally.

02

Causal signal layer with lag alignment

Four signal families feed in above the base model: weather (lag 0 to 14 days), macro (CPI, FX, regional unemployment), internal promo calendar, regional events. The lags are not picked by hand, they are searched per category, because a hot weekend lifts beer the same day and ice cream three days later. The signal layer is additive on top of the base; toggling a signal off at inference time is a clean delta you can show a category manager.

03

Reconciliation for coherence

After the base model emits per-SKU per-store forecasts, a reconciliation step (MinT-style weighted least squares) enforces the hierarchy: store totals must add up to chain totals; category totals must add up to banner totals. Without this step the forecasts disagree with themselves and the planner can't trust any single level. With it, the same number rolls up cleanly from a single jar of olives to the chain Q3 plan.

04

Per-SKU drift, targeted retraining

Supplier substitutions, format changes, regional remerchandising, all of it shows up as drift in a handful of SKUs while the rest of the catalog stays stable. A per-SKU drift score (population stability + residual distribution shift) flags only the SKUs that need a retrain, and a per-SKU retraining job runs without touching the hierarchy weights. Full-hierarchy retraining stays on a quiet cadence; the noisy stuff gets fixed in hours, not weeks.

05

Constraints at the recommendation seam

Order cutoffs, MOQs, case packs, shelf life, supplier lead times, none of these belong inside the demand model. The forecast emits a clean expected-demand number per SKU per store per day; a separate recommendation layer applies the constraints and rounds to an actually-orderable quantity. Clean separation means a constraint change (new supplier, new MOQ) doesn't poison the historical model fit.

/ 03
Architecture

Signals come in from the top (weather, macro, promo, regional events) and feed a causal signal layer with proper lag alignment. The hierarchy (SKU × store × season) enters from the left and feeds the base hierarchical model, where Prophet-ensemble fits run alongside a BSTS trend decomposition and long-tail shrinkage. Reconciliation enforces coherence across the hierarchy before anything reaches a human. From there the forecast splits: the category-manager dashboard exposes what-if toggles for trust; the order-recommendation engine applies operational constraints (MOQs, cutoffs, shelf life) to convert demand into actually-orderable quantities. A drift loop along the bottom taps realized demand, scores per-SKU drift, and triggers targeted retraining without touching the rest of the hierarchy.

CAUSAL SIGNALSWEATHERlag 0..14dMACROcpi · fxPROMOinternalEVENTregionalCAUSAL SIGNAL LAYERlag features · alignment to realized demandHIERARCHYSKU× STORE× SEASONBASE MODELprophet ensemble+ bsts trendlong-tail shrinkageRECONCILEstore ≡ chainmint / wlscoherent forecastCAT MGR DASHBOARDwhat-if · signal togglesORDER RECmoq · cutoff · shelfconstraint layerOBSERVEDrealized demandDRIFT DETECTORper-sku score · thresholdRETRAINper-sku · targetedweight refreshFIG. i · aisle 19 · architecturesignals enter from the top; hierarchy from the left; drift tapped at the bottom; weight refreshloops back to base model. constraints live at the order-recommendation seam, never in the forecast.
/ 04
Challenges
Challenge · 01

Long-tail SKUs had so little per-SKU signal that the legacy model defaulted to safety stock and stayed there year after year.

Resolution

Hierarchical shrinkage: long-tail SKUs borrow strength from their category × region group. The shrinkage strength is learned per group, so a niche category with stable buyers shrinks differently than a churn-heavy one. The long tail stops getting rounded up just because it's quiet.

Challenge · 02

Promo and weather are noisy at the daily level; raw signals dragged the forecast around in ways the team couldn't justify.

Resolution

A causal signal layer with searched lag features (per category) and a sparsity prior on signal coefficients. Signals that don't carry real predictive weight in a given category get zeroed out; the ones that do show their contribution as an attributable delta the category manager can see.

Challenge · 03

Drift was constant, supplier substitutions, regional remerchandising, format changes, and a full-hierarchy retrain was expensive enough that it lagged the drift by weeks.

Resolution

Online per-SKU drift detector (population stability + residual distribution shift) flags only the affected SKUs and triggers a per-SKU retraining job. Hierarchy weights stay stable; the noisy SKUs converge in hours; full retrain stays on a quiet cadence.

Challenge · 04

Category managers didn't trust the model. The previous system was a black box that had been wrong often enough that they routinely overrode its orders.

Resolution

A what-if dashboard exposes the signal toggles directly. Managers can see, per SKU, how much weather is adding, how much promo is contributing, what the forecast looks like without macro factored in. Trust came from visibility, not from a leaderboard number. Adoption hit 90% inside six months.

Challenge · 05

Operational realities (order cutoffs, MOQs, case packs, shelf life) kept leaking into the forecast and corrupting the historical fit.

Resolution

Hard separation. The demand model emits a clean expected-demand number; a downstream recommendation layer applies the constraints and rounds to orderable quantities. Constraints change all the time; the forecast stays clean across the change.

/ 05
Tech stack

Forecast

  • ·Prophet ensemble, per-SKU and per-group
  • ·Custom hierarchical reconciliation (MinT-style WLS)
  • ·BSTS for trend decomposition on top categories

Drift + retraining

  • ·Evidently + custom per-SKU drift score
  • ·Online retraining triggers, per-SKU isolation
  • ·Quiet-cadence full-hierarchy refit (monthly)

Signals

  • ·Weather API, provider-agnostic adapter
  • ·Macro feeds (CPI, FX, regional unemployment)
  • ·Internal promo store + regional event calendar

Serving

  • ·FastAPI for on-demand forecast + recommendation
  • ·Redis cache for hot forecasts (planner UI)
  • ·Batch overnight + on-demand re-fit modes

Dashboards

  • ·Next.js category-manager portal
  • ·Recharts for what-if visualization
  • ·Per-SKU signal-contribution view at order time
/ 06
Try it

Forecast Playground

Three SKU profiles, fast-mover, mid, long-tail, each with 12 months of real-shape demand and two overlaid forecasts: the legacy line (dashed, muted, often over-shoots) and the new hierarchical line (solid, brand). Toggle the four causal signals on and off; the new forecast visibly re-fits, the legacy line never moves. The three stat tiles recompute live so the signal contribution lands in concrete inventory and MAPE numbers, not abstraction.

453423110M1M2M3M4M5M6M7M8M9M10M11M12actuallegacynew hierarchical
Causal signals
Inventory held
$216
legacy $2,502
Stockout days
12d
legacy 6d
Forecast MAPE
6.7%
legacy 91.4%
Operator noteToggle a signal and the new forecast visibly re-fits, the legacy line stays where it always was. The long-tail SKU is the clearest tell: legacy rounds to safety stock and over-orders every month; the hierarchical forecast borrows strength from the SKU's category × region group and stays close to realized demand even with sparse signal. Category managers buy in not because of a leaderboard number but because they can see, line by line, what each signal contributes before they commit the order.
/ 07
Outcomes
Inventory holding
-18%
Long-tail variance cut
50%
Cat-mgr adoption
90%
Stores live
1,200
0255075100M1M2M3M4M5M6M7M8M9M10M11M12deploybaseline8250inventory holding (index, M1 = 100)long-tail forecast variance (index)FIG. ii · 12 months post-deploy
  • Inventory holding down 18% across the catalog within two quarters
  • Forecast variance on long-tail SKUs halved against the legacy baseline
  • Category-manager adoption hit 90% inside 6 months, mostly via the what-if dashboard
  • Full rollout across 1,200 stores on a single live banner with two more in pilot
Footprint

Rolled out across [REDACTED] stores across multiple banners. Per-SKU drift detection runs continuously; full-hierarchy refit on a monthly cadence. Reviewed quarterly against the merchandising plan and the inventory P&L.

Evidence

Live across 1,200 stores on one banner with two more in pilot. Per-SKU drift retraining continuous; full-hierarchy refit monthly. Anonymized case study available Q1 2026.

REGISTER · CASE FILEIN.↓ CH.06