The challenge

A major telecom operator had invested in Adobe Target but was using it as a personalisation tool rather than an experimentation platform. Tests were launched without documented hypotheses, ran for inconsistent durations, and were called significant based on the built-in Target confidence display without understanding what the numbers meant. The digital team had declared multiple winning tests over 18 months, but revenue-per-visitor had not meaningfully changed. Leadership was questioning whether the experimentation program was producing real learning or just confirming what teams already believed.

Diagnosing the measurement problem

The first engagement was a full audit of the 23 tests run over the previous 18 months. Every test was reviewed against three criteria: was there a documented hypothesis with a specific, measurable prediction? Was the test run for a statistically appropriate duration? Was the primary success metric aligned with a business outcome rather than a proxy?

Only 4 of the 23 tests passed all three criteria. The majority had been called significant too early, used metrics that did not map to revenue, or had confounding factors that were not controlled for — including a major UI redesign that was deployed mid-experiment on three occasions.

Rebuilding the measurement foundation

We redesigned the Analytics-Target integration to ensure experiment exposure events were being tracked correctly through Adobe Analytics — enabling accurate segmentation and post-hoc analysis independent of Target's reporting interface.

A new experiment tracking schema was implemented in the data layer, capturing experiment name, variant name, and exposure timestamp as Adobe Analytics eVars. This allowed the analytics team to build Workspace analyses that crossed experiment data with revenue, customer segment, and acquisition channel — dimensions not available in Target's native reporting.

Program governance and velocity

We introduced a lightweight experiment brief template — hypothesis, primary metric, minimum detectable effect, required sample size, and planned duration — that all tests had to complete before launch approval. This added two days to the pre-launch process but eliminated the premature calls that had inflated the previous win rate.

Test velocity actually increased because the brief process forced teams to prioritise ideas with clear, measurable hypotheses — reducing the number of vaguely defined experiments that consumed platform capacity without producing actionable results.

  • Hypothesis documentation required before launch
  • Sample size calculated using MDE and baseline conversion rate
  • Minimum two-week run time regardless of early significance
  • Post-test analysis required before next test on same page area

Outcome

Within six months of the rebuilt program, the team had run 18 tests — fewer than the previous pace but with significantly higher quality. Twelve tests produced clear, actionable results. Three confirmed winners were implemented, delivering a combined 12% lift in digital plan conversion. More importantly, the team now had a structured approach to experimentation that would continue to compound as testing matured.