AI Demand Planning at a Pharmaceutical Distributor: A Partial-Success Case Study

The Partial-Success Pattern: Why This Case Study Exists

Most AI demand planning deployments at pharmaceutical distributors do not fail outright. They produce something more disorienting: measurable gains in some product categories, persistent underperformance in others, eroded planner trust, and a growing gap between what the technology promised and what operations can actually use. That is the partial-success pattern, and it is the dominant documented outcome — not an edge case.

This case study exists because the partial-success pattern is poorly served by the available literature. Vendor case studies document full successes. Post-mortems document categorical failures. The middle ground — where the AI model works on stable SKUs, stalls on complex ones, and the deployment team is left diagnosing which problem to fix first — rarely gets the structured treatment practitioners need.

The article is designed as a diagnostic tool. If your deployment is stalled, the goal is to help you identify which failure mode you are actually dealing with — data-structural, governance-related, or adoption-related — because each requires a different remediation path. If you are pre-go-live, the goal is to surface the conditions that make partial success predictable so you can address them before they materialize.

Split-view illustration of a pharmaceutical distribution warehouse alongside an AI demand planning dashboard showing mixed performance across product categories. — The partial-success pattern: stable SKU categories showing strong AI forecast performance alongside high-complexity product lines in warning states or disconnected from the model.

Case Context: The Distributor Profile

The composite distributor in this case is a mid-size pharmaceutical wholesaler operating in the $800M–$2B annual revenue range. It serves a mix of independent pharmacies, small regional health systems, and long-term care facilities across a multi-state footprint. Its product catalog spans roughly 15,000 to 25,000 active SKUs across branded pharmaceuticals, generic equivalents, specialty injectables, over-the-counter products, and a growing short-dating liquidation segment.

The ERP environment is a legacy system — in this composite, a mid-market ERP with pharmaceutical distribution modules — supplemented by a separate order management system, a 340B split-billing platform serving eligible covered entities, and a warehouse management system that does not share a unified master data layer with the ERP. This fragmented stack is not unusual for distributors of this size; it reflects years of acquisitions, customer-driven platform requirements, and deferred infrastructure investment.

The motivation for AI demand planning investment was straightforward: forecast accuracy on generic substitution events was poor, short-dating write-offs were increasing, and the planning team was spending an unsustainable portion of its time manually adjusting statistical forecasts for shortage-managed products. Leadership expected AI to reduce that manual burden and improve service levels on high-velocity SKUs.

Why Pharmaceutical Distribution Is Structurally Harder

The operational conditions that make pharmaceutical distribution difficult for AI demand planning are not incidental. They are structural features of the business that any AI system must account for to produce usable recommendations.

Short-dating SKUs: Products within 90–180 days of expiration require fundamentally different demand logic than standard inventory. Demand is often accelerated or discounted, and the AI model must distinguish short-dating-driven velocity from genuine demand signals.
Regulatory holds: A product flagged for a regulatory hold — FDA safety alert, manufacturer recall notice, or controlled substance restriction — becomes operationally unusable regardless of the AI's demand forecast. If the hold flag is not surfaced to the AI system in real time, the model will continue recommending replenishment on a product that cannot be shipped.
Multi-tier contract pricing: Pharmaceutical distributors operate under complex contract structures — WAC, chargeback-eligible pricing, 340B pricing, GPO contracts — that affect demand patterns at the customer level. Demand for the same SKU can vary significantly across customer segments based on pricing eligibility, and the AI model needs to understand those segments to forecast accurately.
Generic substitution events: When a branded product loses exclusivity or a preferred generic goes on shortage, demand shifts rapidly and non-linearly across the substitution set. These events are partially predictable but require the AI to incorporate substitution eligibility data that is rarely clean in legacy ERP systems.
Shortage management-driven non-linear demand: During shortage events, ordering behavior becomes defensive — customers order above true demand to secure allocation. This creates demand signals that are systematically misleading for AI models trained on historical order data, which conflates allocation-driven orders with genuine consumption.

Deployment Approach: Tooling, Integration, and Rollout Phasing

The AI demand planning platform selected was a SaaS-based forecasting and inventory optimization tool with documented pharmaceutical distribution customers. Selection was driven primarily by the vendor's ability to demonstrate forecast performance on intermittent-demand SKUs and its published ERP connector for the distributor's legacy system. Integration complexity was acknowledged during vendor selection but underestimated in the project plan.

The integration architecture required connections to three systems: the ERP (primary source for order history, inventory positions, and supplier lead times), the order management system (customer-level demand history and pricing tier assignments), and the 340B split-billing platform (for covered entity demand segmentation). The warehouse management system was explicitly descoped from phase one due to integration complexity.

Planned rollout phasing at go-live. Phase 2 and Phase 3 timelines slipped significantly due to issues that emerged in Phase 1.
Phase	Scope	Product Categories	Integration Points Active	Timeline
Phase 1 Go-Live	Core forecasting on high-velocity SKUs	Stable generics, chronic-care branded	ERP only	Months 1–4
Phase 2 Planned	Expand to specialty and short-dating	Specialty injectables, short-dating generics	ERP + OMS	Months 5–9
Phase 3 Planned	Full portfolio coverage including 340B	All categories including 340B-eligible	ERP + OMS + 340B platform	Months 10–14

Phase one was intentionally scoped to the categories most likely to succeed: stable, high-volume generics and predictable chronic-care branded products. This was a reasonable risk-management decision. It became a problem when the deployment team treated phase one success as validation of the platform's readiness for phase two expansion without resolving the underlying data and governance issues that phase one had quietly surfaced.

Where AI Worked: Stable SKUs and Predictable Demand

In phase one's target categories, the AI model outperformed the legacy statistical forecasting method. The improvement was real and operationally meaningful for the categories in scope.

Stable generic products — high-volume, multi-source, with 24+ months of clean order history — are close to ideal conditions for ML-based demand forecasting. Demand patterns are relatively linear, promotional effects are minimal, and the ERP data quality for these SKUs tends to be higher because they have been actively managed for longer. The AI model's ability to incorporate external signals (seasonality patterns, regional dispensing trends) added incremental accuracy beyond what the legacy method could achieve.

Chronic-care branded products — antihypertensives, statins, diabetes medications — showed similar patterns. Demand is driven by patient adherence and prescription renewal cycles, which are relatively stable and predictable. The AI model's handling of day-of-week and end-of-month ordering patterns improved fill-rate consistency on these SKUs without requiring significant planner intervention.

Conditions that enabled AI success in these categories: 24+ months of clean, consistent order history in the ERP with minimal data gaps.
Single-source or dual-source supply with predictable lead times — no shortage-driven demand distortion in the historical data.
No active regulatory hold history on these SKUs during the training data period.
Pricing tier stability — these SKUs were not subject to frequent chargeback renegotiations or 340B eligibility changes that would fragment the demand signal.
Planner familiarity — planners already had intuitive models for these SKUs and could evaluate AI recommendations without needing to understand the model's logic.

Where It Stalled: Four Core Failure Modes

The stall in phase two was not a single failure. It was four overlapping failures that compounded each other. Each one is documented below with equal weight — the tendency to frame AI deployment failures as primarily a change management or people problem obscures the data-structural and governance failures that are often the actual root cause.

Failure Mode 1: Data Harmonization Breakdown

When the AI platform was extended to specialty and short-dating SKUs, the ERP data quality issues that had been manageable in phase one became disqualifying. The core problem was that the AI platform and the ERP did not share a consistent SKU identifier structure across all product categories.

Specialty injectables had been added to the ERP catalog through a series of supplier onboarding processes that used different item numbering conventions. The order management system had its own customer-facing product codes that mapped to ERP item numbers through a translation table — a translation table that had not been maintained consistently and contained approximately 340 unmapped or incorrectly mapped entries across the specialty catalog.

The practical effect: the AI platform was receiving demand history for some specialty SKUs that was actually an aggregate of multiple distinct products, and receiving no demand history for others because the identifier mismatch caused records to be dropped during the data extraction. Forecasts generated on this data were not just inaccurate — they were structurally unreliable in ways that were not immediately visible to planners reviewing the output.

Even the most advanced AI systems are only as effective as the data they operate on. In fragmented, inconsistent, or siloed environments, these systems become unreliable, brittle, or outright useless. — Logistics Viewpoints

The short-dating segment had a different but equally serious data problem: the ERP did not consistently record lot-level expiration dates in a field that the AI platform's data connector was configured to read. Short-dating status was tracked in a separate inventory module that was not included in the initial integration scope. The AI platform was therefore generating demand forecasts for short-dating products without any visibility into their remaining shelf life — producing recommendations that were algorithmically coherent but operationally unusable.

Failure Mode 2: Absence of Operational Context in the AI Model

Even where SKU identifiers were clean, the AI model was operating without the operational context that planners use to make decisions. Four specific context gaps caused the most damage.

Regulatory hold status: The ERP tracked regulatory holds in a status field that was not included in the AI platform's data feed. The AI continued generating replenishment recommendations for products under active FDA safety alerts or manufacturer recall notices. Planners had to manually cross-check the AI output against a separately maintained hold list — a process that reinforced the perception that the AI was not safe to act on without manual verification.
Short-dating alerts: As noted above, lot-level expiration data was not flowing to the AI platform. This meant the model could not distinguish between normal demand and the accelerated sell-through demand that planners apply to short-dating inventory. Recommendations for short-dating SKUs were systematically overstated.
Generic substitution eligibility: The AI platform had no access to the substitution tables that planners use to manage demand shifts when a preferred generic goes on shortage. When a shortage event occurred, the AI model saw a demand drop on the shorted SKU and a demand spike on the substitution target — but without substitution eligibility data, it could not model the relationship between those signals or forecast the duration of the substitution period.
Contract pricing tier assignments: Customer-level pricing tier data from the order management system was not fully integrated in phase one. For specialty products with significant price variation across customer segments, this meant the AI was forecasting aggregate demand without understanding the customer mix — producing forecasts that were accurate at the total level but unusable for customer-specific replenishment decisions.

Failure Mode 3: Planner Override Escalation and Shadow Excel Processes

Planner adoption failure followed a predictable escalation sequence. It did not begin with rejection — it began with reasonable skepticism that was never resolved, which then hardened into systematic distrust.

In the first weeks after go-live on specialty categories, planners identified specific AI recommendations that were clearly wrong — replenishment suggestions for products on regulatory hold, overstock recommendations for short-dating inventory that needed to be moved rather than replenished. They overrode those recommendations, which was the correct action. But because no process existed to capture the rationale for overrides or feed that information back to the model, each override was invisible to the system.

Over the following months, the override rate climbed. Planners began maintaining their own tracking sheets to document which AI recommendations they had overridden and why — initially as a personal audit trail, then as a shared team resource. By month six of phase two, two senior planners had rebuilt their full specialty category planning workflows in Excel, using the AI platform's output as a reference data point rather than a decision input. The AI investment was being paid for while being systematically bypassed.

This is the shadow Excel pattern, and it is not primarily a change management failure. It is a rational response to a system that is producing operationally unusable recommendations in a regulated environment where acting on a bad recommendation has real consequences — regulatory exposure, customer service failures, write-off risk. Planners who bypass the AI are not being resistant to change; they are protecting the operation.

Failure Mode 4: Governance Gap Before Go-Live

The deployment team did not define decision rights between AI recommendation and human action before go-live on specialty categories. The question of which AI recommendations could be acted on autonomously, which required planner review, and which required supervisor approval was left unresolved — implicitly delegated to individual planners to work out in practice.

In a stable-SKU environment with clean data, this ambiguity is manageable — planners can develop informal norms over time. In a specialty pharmaceutical distribution environment with regulatory hold exposure, short-dating risk, and shortage-management complexity, the absence of a governance framework meant that every planner was making independent decisions about how much to trust the AI output. The result was inconsistent adoption patterns across the team and no organizational learning about which AI recommendations were reliably actionable.

The Insight-to-Action Gap: Why Forecast Accuracy Did Not Move Inventory Metrics

The AI platform's forecast accuracy on stable SKUs was genuinely better than the legacy method. The deployment team could demonstrate this in model evaluation metrics. But service levels and inventory turns on the specialty portfolio did not improve during the phase two period. The accuracy gains did not translate into operational outcomes.

The mechanism is straightforward once named. The value chain that produces operational outcomes from AI demand planning runs: insight → decision → operational change → financial result. The AI platform was generating better insights on the SKUs it had clean data for. But the decision layer — the planners who would act on those insights — had no reliable way to distinguish which AI recommendations were trustworthy from which were operationally unusable due to missing context. The rational response was to treat all AI recommendations with elevated skepticism, which meant the good recommendations were discounted along with the bad ones.

The real barrier is that AI often lacks the operational context required to participate in decisions the way the business actually makes them. Even when the alert is correct, the recommendation often stops short of action because the system does not understand the decision environment well enough to operate within it. — SupplyChainBrain

In the pharmaceutical distributor context, this gap has a specific structure. A regulatory hold flag missing from the AI's data feed does not just affect the recommendation for the held product — it undermines planner confidence in all AI recommendations, because planners cannot be certain which other recommendations are similarly contaminated by missing operational context. The trust failure is systemic, not product-specific.

Illustration showing an AI forecast node on the left, a fragmented connector layer in the center with broken links, and a human planner working on a spreadsheet on the right. — The insight-to-action gap: AI-generated demand signals are produced but cannot be acted on when the execution layer lacks the operational context to validate them. The result is a manual shadow-Excel workaround that bypasses the AI investment entirely.

The insight-to-action gap is also a data architecture problem, not just an adoption problem. The AI platform in this deployment was not connected to the systems that hold the operational context planners need to make decisions — regulatory hold status lived in the ERP status field, short-dating alerts lived in the inventory module, substitution eligibility lived in a manually maintained table. Until those feeds were connected, the AI was generating recommendations in an operational vacuum.

Recovery Steps and Partial Course-Correction

After six months of phase two underperformance, the deployment team initiated a structured recovery effort. The effort had four components, executed in sequence over approximately nine months.

Component 1: Data Audit and Master Data Harmonization

The team conducted a full audit of the AI platform's data feeds against the ERP and OMS source systems. The audit identified 340 unmapped or incorrectly mapped specialty SKU identifiers, 12 product categories where demand history was being aggregated incorrectly due to item numbering inconsistencies, and three ERP fields containing operational context (regulatory hold status, lot expiration date, substitution eligibility flag) that were not included in any data feed to the AI platform.

The master data harmonization project took approximately four months and required dedicated involvement from both the ERP team and the AI platform's implementation support. The SKU identifier mapping was resolved through a combination of automated matching logic and manual review of the unresolvable cases. The three missing operational context fields were added to the data feed through new ERP connector configurations.

Component 2: Operational Context Enrichment

Adding regulatory hold status, lot expiration data, and substitution eligibility to the AI platform's data feed required more than connector configuration — it required the AI platform vendor to update its data model to incorporate these fields as planning constraints rather than just informational attributes. This was a non-trivial platform configuration effort that had not been scoped in the original implementation.

After enrichment, the AI platform was configured to suppress replenishment recommendations for products with active regulatory holds, to flag short-dating inventory for accelerated sell-through rather than standard replenishment, and to model substitution demand as a linked signal rather than independent demand spikes. These configurations materially improved the operational usability of AI recommendations on specialty SKUs.

Component 3: Governance Framework Design

The team designed a tiered decision rights framework defining three categories of AI recommendation: auto-execute (stable generics below a defined order value threshold, no active holds, no short-dating flags), planner-review (all specialty SKUs, any SKU with an active hold or short-dating flag, any recommendation above the auto-execute threshold), and supervisor-approval (shortage-managed products, any recommendation involving a substitution event, orders above a defined value ceiling).

The framework was documented, reviewed with the planning team, and embedded in the AI platform's workflow configuration so that recommendations surfaced to planners already carried their governance tier designation. This reduced the cognitive burden on planners who had previously been making governance judgments on every recommendation individually.

Component 4: Change Management Reboot

The change management reboot was explicitly framed to the planning team as an acknowledgment that the initial deployment had asked planners to trust a system that did not yet have the operational context to earn that trust. The reboot was not a training program — it was a structured re-engagement that started with the data audit findings (showing planners specifically what had been wrong and what had been fixed) and then rebuilt adoption incrementally by category, starting with the categories where the enriched AI recommendations were most demonstrably usable.

The shadow Excel processes were not eliminated by mandate. They were reduced gradually as planners developed confidence that the AI recommendations on specific categories were reliable — a process that took approximately three to four months per category cohort after the data and governance fixes were in place.

Recovery outcomes at the time of this case record. Partial improvement is documented honestly; unresolved issues are not minimized.
Recovery Component	What Improved	What Remained Unresolved
Master data harmonization	Specialty SKU forecast reliability improved materially; identifier mapping errors eliminated	340B platform integration still descoped; customer-level pricing tier data partially incomplete
Operational context enrichment	Regulatory hold suppression working; short-dating flags active; substitution modeling improved	Shortage-management demand distortion in historical data not fully addressed; model retraining on clean data still in progress
Governance framework	Planner decision burden reduced; tiered workflow embedded in platform	Supervisor-approval tier inconsistently applied; escalation logic not fully tested at go-live
Change management reboot	Override rates declining on stable and chronic-care categories; shadow Excel reduced on two of five specialty cohorts	Two senior planners maintaining parallel Excel workflows on shortage-managed products; full adoption not achieved at case record date

Practitioner Takeaways: What to Do Before Go-Live

The four failure modes in this case are not unique to this distributor. They are predictable features of AI demand planning deployments in pharmaceutical distribution contexts. The following pre-deployment requirements emerge directly from the case's failure modes — not from abstract best practice.

1. Pre-Deployment Data Readiness Audit: Pharma-Specific Fields

A standard data readiness audit for AI demand planning checks order history completeness, SKU master data quality, and lead time data availability. In pharmaceutical distribution, that baseline is necessary but not sufficient. The following pharma-specific fields must be audited before go-live:

SKU identifier consistency across ERP, OMS, and any dispensing or 340B platform — verify that the same product is represented by a consistent identifier in every system that will feed the AI platform.
Regulatory hold status field: confirm it is populated consistently, updated in real time, and included in the AI platform data feed scope.
Lot-level expiration date availability: confirm that lot expiration data is captured at the item level in the ERP or WMS and is accessible to the AI platform's data connector.
Generic substitution tables: confirm that substitution eligibility relationships are maintained in a structured, queryable form — not in a planner's personal spreadsheet.
Customer pricing tier assignments: confirm that contract pricing tier data from the OMS is mappable to customer-level demand history in the AI platform.
Shortage-managed product flags: confirm that products currently under shortage management are flagged in a way that the AI platform can use to apply appropriate demand modeling logic.

2. Operational Context Requirements: Confirm Before Go-Live

Before expanding AI demand planning beyond stable SKU categories, confirm that each of the following operational context feeds is active and tested in the AI platform:

Operational context requirements specific to pharmaceutical distribution AI demand planning deployments.
Operational Context	Source System	Go-Live Requirement	Risk If Missing
Regulatory hold status	ERP status field	Active feed, real-time or daily refresh	AI recommends replenishment on held products; planner trust collapse
Short-dating / lot expiration	ERP inventory module or WMS	Lot-level data mapped to AI platform	AI overstates replenishment on short-dating inventory; write-off risk
Generic substitution eligibility	Substitution table (ERP or standalone)	Structured feed, not manual spreadsheet	AI cannot model substitution demand shifts; shortage event forecasts unreliable
Customer pricing tier	OMS contract data	Customer-level mapping confirmed	Aggregate forecasts inaccurate for customer-specific replenishment
Shortage management flags	ERP or procurement system	Active flag with AI model configuration for shortage logic	Historical shortage-driven orders distort AI training data; forecast systematically overstated

3. Governance Framework: Design Before Go-Live

Define AI recommendation decision rights before planners encounter their first AI output. The governance framework does not need to be complex, but it must answer three questions explicitly:

Which AI recommendations can be acted on without human review? (Define by category, order value threshold, and absence of hold/short-dating flags — not by planner judgment.)
Which recommendations require planner review before action? (Define the review criteria — what the planner is checking for, not just that review is required.)
Which recommendations require supervisor approval? (Define the escalation triggers — shortage-managed products, high-value orders, substitution events — and the approval workflow.)

Embed the governance tier designation in the AI platform's workflow so planners see it on every recommendation without having to apply the framework manually. Governance that lives in a document rather than in the system will not be applied consistently.

4. Change Management: Anticipate the Override Escalation Pattern

Planner override escalation is predictable. Plan for it explicitly rather than treating it as a failure signal when it occurs. Three specific practices reduce escalation risk:

Build an override capture process from day one. Every planner override should record the reason — not to police planners, but to identify systematic AI recommendation failures that require data or configuration fixes. Without override data, you cannot distinguish between a planner who is being resistant and a planner who is correctly identifying a model deficiency.
Start with the categories where the AI is most demonstrably reliable and expand incrementally. Asking planners to trust an AI system on their most complex, highest-risk product lines before it has demonstrated reliability on simpler categories is a governance failure, not a change management challenge.
Show planners the data audit findings before asking them to re-engage with the system after a stall. Planners who understand specifically what was wrong and what has been fixed are more likely to give the system a second evaluation than planners who are told the system has been improved without explanation.

Conclusion: Partial Success as a Recoverable Starting Point

Partial deployment success in AI demand planning is not a failure state. It is a diagnostic state. The question it requires is not whether to continue with the deployment, but which failure mode is driving the underperformance — because each requires a different remediation path.

Three remediation paths based on root cause. In practice, most stalled deployments involve elements of all three — sequence remediation by addressing data-structural failures first, governance second, adoption third.
Root Cause Category	Diagnostic Signal	Primary Remediation	Recovery Timeline Expectation
Data-structural failure	AI recommendations are wrong on specific SKU categories; override rates clustered around particular product types; forecast accuracy metrics diverge from planner experience	Data audit → master data harmonization → operational context feed engineering	4–6 months for harmonization; model retraining adds 2–3 months on top
Governance-related failure	Inconsistent adoption across planning team; no shared norms for which AI recommendations to act on; escalations handled differently by different planners	Tiered decision rights framework design → platform workflow embedding → governance training	6–8 weeks for framework design; adoption normalization takes 2–3 months after embedding
Adoption-related failure	Shadow Excel processes growing; planners can articulate specific reasons for distrust; override rates high but rationale not captured	Override capture process → category-by-category trust rebuild → change management reboot anchored in data audit findings	3–4 months per category cohort after underlying data and governance issues resolved

In pharmaceutical distribution, these failures are predictable precisely because the environment makes them likely. Short-dating SKUs, regulatory holds, multi-tier contract pricing, and shortage-management demand patterns are not edge cases — they are the operational reality that any AI demand planning system must account for to be usable. Organizations that conduct the diagnostic work before go-live — auditing the pharma-specific data fields, confirming operational context feeds, designing governance frameworks, and planning for the override escalation pattern — are not eliminating the risk of partial success. They are ensuring that when partial success occurs, they have the vocabulary and the framework to recover from it.