Traditional A/B testing is broken. Teams run experiments for weeks, split traffic 50/50 between control and treatment, and lose countless conversions to underperforming variants. According to Optimizely research, e-commerce companies implementing AI-driven testing achieve 20-40% higher conversion improvements compared to traditional methods, while reducing test duration by 30-50%.
In this comprehensive guide, we'll explore how AI transforms A/B testing from a static, manual process into a self-optimizing system. From multi-armed bandit algorithms that intelligently allocate traffic to AI-generated variants that explore design spaces humans wouldn't consider, you'll learn to build testing systems that continuously improve without manual intervention.
Traditional A/B Testing vs. AI-Powered Optimization
Traditional A/B testing follows a rigid process: define a hypothesis, create variants, split traffic equally, wait for statistical significance, then deploy the winner. This approach has fundamental limitations that AI can address:
// traditional-ab-test.ts
// The problems with traditional A/B testing
interface TraditionalABTest {
control: Variant;
treatment: Variant;
trafficSplit: [number, number]; // Always 50/50
durationWeeks: number; // Fixed duration
sampleSizeRequired: number; // Pre-calculated
}
// Problem 1: Opportunity Cost
// If treatment is clearly better, we still show control to 50% of users
const opportunityCost = (
dailyVisitors: number,
testDurationDays: number,
controlConversionRate: number,
treatmentConversionRate: number,
averageOrderValue: number
): number => {
const controlVisitors = (dailyVisitors * 0.5) * testDurationDays;
const lostConversions = controlVisitors * (treatmentConversionRate - controlConversionRate);
return lostConversions * averageOrderValue;
};
// Example: 10,000 daily visitors, 14-day test
// Control: 2% conversion, Treatment: 2.5% conversion, AOV: $100
// Lost revenue: 10,000 * 0.5 * 14 * 0.005 * $100 = $35,000
// Problem 2: Fixed Duration
// Tests run until pre-determined end date, even if winner is clear
interface FixedDurationProblem {
earlyWinnerDetected: boolean; // Often true at day 3-4
remainingTestDays: number; // But we wait 10+ more days
confidenceLevel: number; // Already at 95%+
// Result: Weeks of suboptimal performance
}
// Problem 3: No Adaptation
// Cannot adjust based on user segments, time of day, etc.
interface NoAdaptationProblem {
mobileUsersBetterWithVariantA: boolean;
desktopUsersBetterWithVariantB: boolean;
// Traditional tests can't serve different variants to different segments
}
Multi-Armed Bandit Algorithms
The multi-armed bandit problem is a classic exploration-exploitation tradeoff. Named after slot machines (one-armed bandits), the algorithm must decide whether to "exploit" the arm that's currently performing best or "explore" other arms that might perform even better. Here's a complete implementation:
// multi-armed-bandit.ts
interface BanditVariant {
id: string;
name: string;
impressions: number;
conversions: number;
revenue: number;
}
interface BanditConfig {
strategy: 'epsilon-greedy' | 'ucb1' | 'thompson-sampling';
explorationRate?: number; // For epsilon-greedy
minSamplesPerVariant?: number;
}
class MultiArmedBandit {
private variants: Map = new Map();
private config: BanditConfig;
constructor(variants: string[], config: BanditConfig) {
this.config = config;
variants.forEach(id => {
this.variants.set(id, {
id,
name: id,
impressions: 0,
conversions: 0,
revenue: 0
});
});
}
// Select which variant to show
selectVariant(): string {
switch (this.config.strategy) {
case 'epsilon-greedy':
return this.epsilonGreedy();
case 'ucb1':
return this.ucb1();
case 'thompson-sampling':
return this.thompsonSampling();
default:
return this.thompsonSampling();
}
}
// Epsilon-Greedy: Simple but effective
// Explore with probability epsilon, exploit otherwise
private epsilonGreedy(): string {
const epsilon = this.config.explorationRate || 0.1;
// Ensure minimum samples for all variants
const minSamples = this.config.minSamplesPerVariant || 100;
for (const [id, variant] of this.variants) {
if (variant.impressions < minSamples) {
return id;
}
}
// Explore with probability epsilon
if (Math.random() < epsilon) {
const variantIds = [...this.variants.keys()];
return variantIds[Math.floor(Math.random() * variantIds.length)];
}
// Exploit: choose best performing variant
return this.getBestVariant();
}
// UCB1: Upper Confidence Bound
// Balances exploitation with uncertainty-based exploration
private ucb1(): string {
const totalImpressions = [...this.variants.values()]
.reduce((sum, v) => sum + v.impressions, 0);
if (totalImpressions === 0) {
const variantIds = [...this.variants.keys()];
return variantIds[0];
}
let bestVariant = '';
let bestUCB = -Infinity;
for (const [id, variant] of this.variants) {
if (variant.impressions === 0) {
return id; // Always try unexplored variants
}
const conversionRate = variant.conversions / variant.impressions;
const exploration = Math.sqrt(
(2 * Math.log(totalImpressions)) / variant.impressions
);
const ucb = conversionRate + exploration;
if (ucb > bestUCB) {
bestUCB = ucb;
bestVariant = id;
}
}
return bestVariant;
}
// Thompson Sampling: Bayesian approach
// Sample from posterior distribution of each variant
private thompsonSampling(): string {
let bestVariant = '';
let bestSample = -Infinity;
for (const [id, variant] of this.variants) {
// Beta distribution parameters
// Prior: Beta(1, 1) = uniform distribution
const alpha = variant.conversions + 1;
const beta = variant.impressions - variant.conversions + 1;
// Sample from Beta distribution
const sample = this.betaSample(alpha, beta);
if (sample > bestSample) {
bestSample = sample;
bestVariant = id;
}
}
return bestVariant;
}
// Beta distribution sampling using Joehnk's method
private betaSample(alpha: number, beta: number): number {
if (alpha <= 0 || beta <= 0) return 0.5;
// Use gamma distribution relationship
const gammaAlpha = this.gammaSample(alpha);
const gammaBeta = this.gammaSample(beta);
return gammaAlpha / (gammaAlpha + gammaBeta);
}
// Gamma distribution sampling using Marsaglia and Tsang's method
private gammaSample(shape: number): number {
if (shape < 1) {
return this.gammaSample(shape + 1) * Math.pow(Math.random(), 1 / shape);
}
const d = shape - 1 / 3;
const c = 1 / Math.sqrt(9 * d);
while (true) {
let x: number, v: number;
do {
x = this.normalSample();
v = 1 + c * x;
} while (v <= 0);
v = v * v * v;
const u = Math.random();
if (u < 1 - 0.0331 * (x * x) * (x * x)) {
return d * v;
}
if (Math.log(u) < 0.5 * x * x + d * (1 - v + Math.log(v))) {
return d * v;
}
}
}
// Standard normal distribution sampling (Box-Muller)
private normalSample(): number {
const u1 = Math.random();
const u2 = Math.random();
return Math.sqrt(-2 * Math.log(u1)) * Math.cos(2 * Math.PI * u2);
}
// Record outcome
recordOutcome(variantId: string, converted: boolean, revenue: number = 0): void {
const variant = this.variants.get(variantId);
if (!variant) return;
variant.impressions++;
if (converted) {
variant.conversions++;
variant.revenue += revenue;
}
}
// Get best performing variant
private getBestVariant(): string {
let bestVariant = '';
let bestRate = -Infinity;
for (const [id, variant] of this.variants) {
if (variant.impressions === 0) continue;
const rate = variant.conversions / variant.impressions;
if (rate > bestRate) {
bestRate = rate;
bestVariant = id;
}
}
return bestVariant || [...this.variants.keys()][0];
}
// Get statistics for all variants
getStats(): VariantStats[] {
return [...this.variants.values()].map(v => ({
id: v.id,
name: v.name,
impressions: v.impressions,
conversions: v.conversions,
conversionRate: v.impressions > 0 ? v.conversions / v.impressions : 0,
revenue: v.revenue,
revenuePerVisitor: v.impressions > 0 ? v.revenue / v.impressions : 0,
confidence: this.calculateConfidence(v)
}));
}
private calculateConfidence(variant: BanditVariant): number {
if (variant.impressions < 30) return 0;
const p = variant.conversions / variant.impressions;
const standardError = Math.sqrt((p * (1 - p)) / variant.impressions);
const zScore = p / standardError;
// Approximate confidence from z-score
return Math.min(0.999, this.normalCDF(zScore));
}
private normalCDF(x: number): number {
const a1 = 0.254829592;
const a2 = -0.284496736;
const a3 = 1.421413741;
const a4 = -1.453152027;
const a5 = 1.061405429;
const p = 0.3275911;
const sign = x < 0 ? -1 : 1;
x = Math.abs(x) / Math.sqrt(2);
const t = 1.0 / (1.0 + p * x);
const y = 1.0 - (((((a5 * t + a4) * t) + a3) * t + a2) * t + a1) * t * Math.exp(-x * x);
return 0.5 * (1.0 + sign * y);
}
}
interface VariantStats {
id: string;
name: string;
impressions: number;
conversions: number;
conversionRate: number;
revenue: number;
revenuePerVisitor: number;
confidence: number;
}
AI-Powered Automated Variant Generation
One of the most powerful applications of AI in A/B testing is automatically generating variant copy, designs, and layouts. Using large language models like GPT-4, we can create hundreds of test variants that explore the design space far beyond human imagination:
// variant-generator.ts
import OpenAI from 'openai';
interface VariantTemplate {
component: 'headline' | 'cta' | 'hero' | 'pricing' | 'testimonial';
currentContent: string;
context: {
product: string;
targetAudience: string;
valueProposition: string;
tone: 'professional' | 'casual' | 'urgent' | 'friendly';
};
constraints?: {
maxLength?: number;
mustInclude?: string[];
mustAvoid?: string[];
};
}
interface GeneratedVariant {
id: string;
content: string;
rationale: string;
predictedLift: number;
psychologicalPrinciple: string;
}
class AIVariantGenerator {
private openai: OpenAI;
private historicalData: Map = new Map();
constructor(apiKey: string) {
this.openai = new OpenAI({ apiKey });
}
async generateVariants(
template: VariantTemplate,
count: number = 5
): Promise {
const prompt = this.buildPrompt(template, count);
const response = await this.openai.chat.completions.create({
model: 'gpt-4-turbo-preview',
messages: [
{
role: 'system',
content: `You are an expert conversion rate optimization specialist with deep knowledge of
persuasion psychology, including Cialdini's principles (reciprocity, commitment, social proof,
authority, liking, scarcity), cognitive biases, and emotional triggers.
Your task is to generate A/B test variants that are:
1. Significantly different from the control (not minor word changes)
2. Based on proven psychological principles
3. Appropriate for the target audience
4. Measurably testable
Return your response as valid JSON.`
},
{
role: 'user',
content: prompt
}
],
response_format: { type: 'json_object' },
temperature: 0.8 // Higher creativity
});
const result = JSON.parse(response.choices[0].message.content || '{}');
return this.processGeneratedVariants(result.variants || []);
}
private buildPrompt(template: VariantTemplate, count: number): string {
const historicalInsights = this.getHistoricalInsights(template.component);
return `
Generate ${count} A/B test variants for the following:
**Component Type:** ${template.component}
**Current Content (Control):** "${template.currentContent}"
**Context:**
- Product/Service: ${template.context.product}
- Target Audience: ${template.context.targetAudience}
- Value Proposition: ${template.context.valueProposition}
- Desired Tone: ${template.context.tone}
**Constraints:**
${template.constraints?.maxLength ? `- Maximum length: ${template.constraints.maxLength} characters` : ''}
${template.constraints?.mustInclude ? `- Must include: ${template.constraints.mustInclude.join(', ')}` : ''}
${template.constraints?.mustAvoid ? `- Must avoid: ${template.constraints.mustAvoid.join(', ')}` : ''}
**Historical Insights:**
${historicalInsights}
Generate variants that test fundamentally different approaches:
1. One variant using SCARCITY (limited time/quantity)
2. One variant using SOCIAL PROOF (numbers, testimonials reference)
3. One variant using AUTHORITY (expertise, credentials)
4. One variant using EMOTIONAL APPEAL (pain points, aspirations)
5. One variant using SPECIFICITY (concrete numbers, details)
Return JSON in this format:
{
"variants": [
{
"content": "The variant text",
"rationale": "Why this variant might outperform",
"predictedLift": 15,
"psychologicalPrinciple": "Scarcity"
}
]
}`;
}
private getHistoricalInsights(component: string): string {
const history = this.historicalData.get(component);
if (!history || history.length === 0) {
return 'No historical data available for this component type.';
}
const winners = history
.filter(v => v.isWinner)
.sort((a, b) => b.lift - a.lift)
.slice(0, 3);
const losers = history
.filter(v => !v.isWinner)
.sort((a, b) => a.lift - b.lift)
.slice(0, 3);
return `
Top performing patterns:
${winners.map(w => `- "${w.content}" achieved ${w.lift}% lift using ${w.principle}`).join('\n')}
Patterns that underperformed:
${losers.map(l => `- "${l.content}" had ${l.lift}% negative lift`).join('\n')}
`;
}
private processGeneratedVariants(raw: any[]): GeneratedVariant[] {
return raw.map((v, index) => ({
id: `variant-${Date.now()}-${index}`,
content: v.content,
rationale: v.rationale,
predictedLift: v.predictedLift || 0,
psychologicalPrinciple: v.psychologicalPrinciple || 'Unknown'
}));
}
// Learn from test results to improve future generations
recordTestResult(
component: string,
content: string,
principle: string,
lift: number,
isWinner: boolean
): void {
if (!this.historicalData.has(component)) {
this.historicalData.set(component, []);
}
this.historicalData.get(component)!.push({
content,
principle,
lift,
isWinner,
timestamp: Date.now()
});
// Keep only recent data (last 100 tests per component)
const data = this.historicalData.get(component)!;
if (data.length > 100) {
this.historicalData.set(component, data.slice(-100));
}
}
}
interface VariantPerformance {
content: string;
principle: string;
lift: number;
isWinner: boolean;
timestamp: number;
}
// Usage example
async function generateCTAVariants() {
const generator = new AIVariantGenerator(process.env.OPENAI_API_KEY!);
const variants = await generator.generateVariants({
component: 'cta',
currentContent: 'Sign Up Now',
context: {
product: 'SaaS project management tool',
targetAudience: 'Startup founders and product managers',
valueProposition: 'Ship products 2x faster with AI-powered task prioritization',
tone: 'professional'
},
constraints: {
maxLength: 25,
mustAvoid: ['Free', 'Trial'] // Already mentioned elsewhere
}
});
console.log('Generated CTA Variants:');
variants.forEach(v => {
console.log(`- "${v.content}" (${v.psychologicalPrinciple}, predicted +${v.predictedLift}%)`);
console.log(` Rationale: ${v.rationale}`);
});
}
Calculating Statistical Significance Correctly
One of the biggest pitfalls in A/B testing is declaring a winner too early or misunderstanding statistical significance. According to Evan Miller's research, the most common mistake is "peeking" at results and stopping when significance is first reached. Here's how to do it correctly:
// statistical-significance.ts
interface ExperimentData {
control: {
visitors: number;
conversions: number;
revenue?: number;
};
treatment: {
visitors: number;
conversions: number;
revenue?: number;
};
}
interface SignificanceResult {
isSignificant: boolean;
confidence: number;
pValue: number;
relativeUplift: number;
absoluteUplift: number;
confidenceInterval: [number, number];
sampleSizeSufficient: boolean;
recommendedAdditionalSamples: number;
powerAnalysis: {
currentPower: number;
minimumDetectableEffect: number;
};
}
class StatisticalAnalyzer {
private readonly SIGNIFICANCE_THRESHOLD = 0.05; // 95% confidence
private readonly MINIMUM_POWER = 0.8; // 80% power
analyzeExperiment(data: ExperimentData): SignificanceResult {
const controlRate = data.control.conversions / data.control.visitors;
const treatmentRate = data.treatment.conversions / data.treatment.visitors;
const absoluteUplift = treatmentRate - controlRate;
const relativeUplift = controlRate > 0
? ((treatmentRate - controlRate) / controlRate) * 100
: 0;
// Calculate p-value using two-proportion z-test
const pValue = this.twoProportionZTest(data);
const isSignificant = pValue < this.SIGNIFICANCE_THRESHOLD;
const confidence = (1 - pValue) * 100;
// Calculate confidence interval
const confidenceInterval = this.calculateConfidenceInterval(data);
// Power analysis
const powerAnalysis = this.calculatePower(data);
const sampleSizeSufficient = powerAnalysis.currentPower >= this.MINIMUM_POWER;
// Calculate recommended additional samples
const recommendedAdditionalSamples = sampleSizeSufficient
? 0
: this.calculateRequiredSamples(controlRate, treatmentRate) -
(data.control.visitors + data.treatment.visitors);
return {
isSignificant,
confidence,
pValue,
relativeUplift,
absoluteUplift,
confidenceInterval,
sampleSizeSufficient,
recommendedAdditionalSamples: Math.max(0, recommendedAdditionalSamples),
powerAnalysis
};
}
private twoProportionZTest(data: ExperimentData): number {
const n1 = data.control.visitors;
const n2 = data.treatment.visitors;
const p1 = data.control.conversions / n1;
const p2 = data.treatment.conversions / n2;
// Pooled proportion
const pooledP = (data.control.conversions + data.treatment.conversions) / (n1 + n2);
// Standard error
const standardError = Math.sqrt(
pooledP * (1 - pooledP) * (1 / n1 + 1 / n2)
);
if (standardError === 0) return 1;
// Z-score
const zScore = (p2 - p1) / standardError;
// Two-tailed p-value
const pValue = 2 * (1 - this.normalCDF(Math.abs(zScore)));
return pValue;
}
private calculateConfidenceInterval(
data: ExperimentData,
confidenceLevel: number = 0.95
): [number, number] {
const p1 = data.control.conversions / data.control.visitors;
const p2 = data.treatment.conversions / data.treatment.visitors;
const diff = p2 - p1;
// Standard error of difference
const se = Math.sqrt(
(p1 * (1 - p1)) / data.control.visitors +
(p2 * (1 - p2)) / data.treatment.visitors
);
// Z-score for confidence level
const zScore = this.inverseNormalCDF((1 + confidenceLevel) / 2);
return [
(diff - zScore * se) * 100,
(diff + zScore * se) * 100
];
}
private calculatePower(data: ExperimentData): {
currentPower: number;
minimumDetectableEffect: number;
} {
const p1 = data.control.conversions / data.control.visitors;
const p2 = data.treatment.conversions / data.treatment.visitors;
const n = Math.min(data.control.visitors, data.treatment.visitors);
// Effect size (Cohen's h)
const h = 2 * Math.asin(Math.sqrt(p2)) - 2 * Math.asin(Math.sqrt(p1));
// Standard error
const se = Math.sqrt(2 / n);
// Non-centrality parameter
const ncp = Math.abs(h) / se;
// Z-score for alpha
const zAlpha = this.inverseNormalCDF(1 - this.SIGNIFICANCE_THRESHOLD / 2);
// Power calculation
const power = 1 - this.normalCDF(zAlpha - ncp);
// Minimum Detectable Effect
const mde = 2.8 * Math.sqrt(p1 * (1 - p1) / n) * 100;
return {
currentPower: Math.min(1, Math.max(0, power)),
minimumDetectableEffect: mde
};
}
private calculateRequiredSamples(
controlRate: number,
treatmentRate: number,
alpha: number = 0.05,
power: number = 0.8
): number {
const zAlpha = this.inverseNormalCDF(1 - alpha / 2);
const zBeta = this.inverseNormalCDF(power);
const pooledP = (controlRate + treatmentRate) / 2;
const delta = Math.abs(treatmentRate - controlRate);
if (delta === 0) return Infinity;
const n = 2 * Math.pow(
(zAlpha * Math.sqrt(2 * pooledP * (1 - pooledP)) +
zBeta * Math.sqrt(controlRate * (1 - controlRate) +
treatmentRate * (1 - treatmentRate))) / delta,
2
);
return Math.ceil(n) * 2; // Total for both groups
}
// Sequential testing to avoid peeking problem
sequentialTest(
data: ExperimentData,
maxSamples: number
): {
decision: 'winner' | 'loser' | 'continue';
boundary: number;
} {
const totalSamples = data.control.visitors + data.treatment.visitors;
const information = totalSamples / maxSamples;
// O'Brien-Fleming spending function
const alphaSpent = this.obrienFlemingBoundary(information);
const currentPValue = this.twoProportionZTest(data);
if (currentPValue < alphaSpent) {
const treatmentRate = data.treatment.conversions / data.treatment.visitors;
const controlRate = data.control.conversions / data.control.visitors;
return {
decision: treatmentRate > controlRate ? 'winner' : 'loser',
boundary: alphaSpent
};
}
if (information >= 1) {
return { decision: 'loser', boundary: alphaSpent };
}
return { decision: 'continue', boundary: alphaSpent };
}
private obrienFlemingBoundary(information: number): number {
// O'Brien-Fleming alpha spending function
if (information <= 0) return 0;
if (information >= 1) return this.SIGNIFICANCE_THRESHOLD;
const zBoundary = this.inverseNormalCDF(1 - this.SIGNIFICANCE_THRESHOLD / 2) /
Math.sqrt(information);
return 2 * (1 - this.normalCDF(zBoundary));
}
private normalCDF(x: number): number {
const a1 = 0.254829592;
const a2 = -0.284496736;
const a3 = 1.421413741;
const a4 = -1.453152027;
const a5 = 1.061405429;
const p = 0.3275911;
const sign = x < 0 ? -1 : 1;
x = Math.abs(x) / Math.sqrt(2);
const t = 1.0 / (1.0 + p * x);
const y = 1.0 - (((((a5 * t + a4) * t) + a3) * t + a2) * t + a1) * t * Math.exp(-x * x);
return 0.5 * (1.0 + sign * y);
}
private inverseNormalCDF(p: number): number {
// Rational approximation for inverse normal CDF
if (p <= 0) return -Infinity;
if (p >= 1) return Infinity;
const a = [
-3.969683028665376e+01, 2.209460984245205e+02,
-2.759285104469687e+02, 1.383577518672690e+02,
-3.066479806614716e+01, 2.506628277459239e+00
];
const b = [
-5.447609879822406e+01, 1.615858368580409e+02,
-1.556989798598866e+02, 6.680131188771972e+01,
-1.328068155288572e+01
];
const c = [
-7.784894002430293e-03, -3.223964580411365e-01,
-2.400758277161838e+00, -2.549732539343734e+00,
4.374664141464968e+00, 2.938163982698783e+00
];
const d = [
7.784695709041462e-03, 3.224671290700398e-01,
2.445134137142996e+00, 3.754408661907416e+00
];
const pLow = 0.02425;
const pHigh = 1 - pLow;
let q: number, r: number;
if (p < pLow) {
q = Math.sqrt(-2 * Math.log(p));
return (((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5]) /
((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1);
} else if (p <= pHigh) {
q = p - 0.5;
r = q * q;
return (((((a[0] * r + a[1]) * r + a[2]) * r + a[3]) * r + a[4]) * r + a[5]) * q /
(((((b[0] * r + b[1]) * r + b[2]) * r + b[3]) * r + b[4]) * r + 1);
} else {
q = Math.sqrt(-2 * Math.log(1 - p));
return -(((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5]) /
((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1);
}
}
}
Continuous Optimization Loop
The ultimate goal is a self-optimizing system that continuously generates, tests, and promotes winning variants without manual intervention. Here's how to build this continuous optimization loop:
// continuous-optimization.ts
interface OptimizationTarget {
id: string;
component: string;
metric: 'conversion_rate' | 'revenue_per_visitor' | 'engagement';
currentBest: string;
variants: ActiveVariant[];
history: TestResult[];
}
interface ActiveVariant {
id: string;
content: string;
status: 'testing' | 'champion' | 'retired';
impressions: number;
conversions: number;
revenue: number;
createdAt: Date;
principle: string;
}
interface TestResult {
variantId: string;
content: string;
impressions: number;
conversions: number;
conversionRate: number;
lift: number;
isWinner: boolean;
completedAt: Date;
}
class ContinuousOptimizer {
private targets: Map = new Map();
private bandit: MultiArmedBandit;
private variantGenerator: AIVariantGenerator;
private analyzer: StatisticalAnalyzer;
private config = {
minImpressionsPerVariant: 1000,
maxActiveVariants: 5,
confidenceThreshold: 0.95,
minLiftToPromote: 0.05, // 5% minimum lift
refreshIntervalHours: 24,
retireAfterLosses: 3
};
constructor(
bandit: MultiArmedBandit,
variantGenerator: AIVariantGenerator,
analyzer: StatisticalAnalyzer
) {
this.bandit = bandit;
this.variantGenerator = variantGenerator;
this.analyzer = analyzer;
}
async initializeTarget(
id: string,
component: string,
metric: 'conversion_rate' | 'revenue_per_visitor' | 'engagement',
controlContent: string
): Promise {
const target: OptimizationTarget = {
id,
component,
metric,
currentBest: 'control',
variants: [{
id: 'control',
content: controlContent,
status: 'champion',
impressions: 0,
conversions: 0,
revenue: 0,
createdAt: new Date(),
principle: 'Original'
}],
history: []
};
this.targets.set(id, target);
// Generate initial challenger variants
await this.generateNewChallengers(id);
}
private async generateNewChallengers(targetId: string): Promise {
const target = this.targets.get(targetId);
if (!target) return;
const champion = target.variants.find(v => v.status === 'champion');
if (!champion) return;
const activeCount = target.variants.filter(v => v.status === 'testing').length;
const neededVariants = this.config.maxActiveVariants - activeCount - 1; // -1 for champion
if (neededVariants <= 0) return;
const newVariants = await this.variantGenerator.generateVariants({
component: target.component as any,
currentContent: champion.content,
context: {
product: 'Your product',
targetAudience: 'Your audience',
valueProposition: 'Your value prop',
tone: 'professional'
}
}, neededVariants);
newVariants.forEach(v => {
target.variants.push({
id: v.id,
content: v.content,
status: 'testing',
impressions: 0,
conversions: 0,
revenue: 0,
createdAt: new Date(),
principle: v.psychologicalPrinciple
});
});
}
// Main optimization loop - call this on each impression
selectVariant(targetId: string): ActiveVariant | null {
const target = this.targets.get(targetId);
if (!target) return null;
const activeVariants = target.variants.filter(
v => v.status === 'champion' || v.status === 'testing'
);
if (activeVariants.length === 0) return null;
// Use bandit to select
const selectedId = this.bandit.selectVariant();
return activeVariants.find(v => v.id === selectedId) || activeVariants[0];
}
// Record outcome and trigger analysis
async recordOutcome(
targetId: string,
variantId: string,
converted: boolean,
revenue: number = 0
): Promise {
const target = this.targets.get(targetId);
if (!target) return;
const variant = target.variants.find(v => v.id === variantId);
if (!variant) return;
variant.impressions++;
if (converted) {
variant.conversions++;
variant.revenue += revenue;
}
this.bandit.recordOutcome(variantId, converted, revenue);
// Check if we should evaluate
if (this.shouldEvaluate(target)) {
await this.evaluateAndOptimize(targetId);
}
}
private shouldEvaluate(target: OptimizationTarget): boolean {
const testingVariants = target.variants.filter(v => v.status === 'testing');
// Evaluate when any testing variant has enough impressions
return testingVariants.some(
v => v.impressions >= this.config.minImpressionsPerVariant
);
}
private async evaluateAndOptimize(targetId: string): Promise {
const target = this.targets.get(targetId);
if (!target) return;
const champion = target.variants.find(v => v.status === 'champion');
if (!champion) return;
const testingVariants = target.variants.filter(v => v.status === 'testing');
for (const challenger of testingVariants) {
if (challenger.impressions < this.config.minImpressionsPerVariant) {
continue;
}
const result = this.analyzer.analyzeExperiment({
control: {
visitors: champion.impressions,
conversions: champion.conversions,
revenue: champion.revenue
},
treatment: {
visitors: challenger.impressions,
conversions: challenger.conversions,
revenue: challenger.revenue
}
});
if (result.isSignificant && result.sampleSizeSufficient) {
if (result.relativeUplift >= this.config.minLiftToPromote * 100) {
// Challenger wins - promote to champion
await this.promoteChallenger(target, champion, challenger, result);
} else if (result.relativeUplift <= -this.config.minLiftToPromote * 100) {
// Challenger loses - retire it
this.retireVariant(target, challenger, result);
}
}
}
// Generate new challengers if needed
await this.generateNewChallengers(targetId);
}
private async promoteChallenger(
target: OptimizationTarget,
oldChampion: ActiveVariant,
newChampion: ActiveVariant,
result: SignificanceResult
): Promise {
// Record result
target.history.push({
variantId: newChampion.id,
content: newChampion.content,
impressions: newChampion.impressions,
conversions: newChampion.conversions,
conversionRate: newChampion.conversions / newChampion.impressions,
lift: result.relativeUplift,
isWinner: true,
completedAt: new Date()
});
// Update statuses
oldChampion.status = 'retired';
newChampion.status = 'champion';
target.currentBest = newChampion.id;
// Record for AI learning
this.variantGenerator.recordTestResult(
target.component,
newChampion.content,
newChampion.principle,
result.relativeUplift,
true
);
console.log(`New champion promoted for ${target.id}:`);
console.log(` Content: "${newChampion.content}"`);
console.log(` Lift: +${result.relativeUplift.toFixed(2)}%`);
console.log(` Confidence: ${result.confidence.toFixed(1)}%`);
}
private retireVariant(
target: OptimizationTarget,
variant: ActiveVariant,
result: SignificanceResult
): void {
variant.status = 'retired';
target.history.push({
variantId: variant.id,
content: variant.content,
impressions: variant.impressions,
conversions: variant.conversions,
conversionRate: variant.conversions / variant.impressions,
lift: result.relativeUplift,
isWinner: false,
completedAt: new Date()
});
this.variantGenerator.recordTestResult(
target.component,
variant.content,
variant.principle,
result.relativeUplift,
false
);
}
// Get current optimization status
getStatus(targetId: string): OptimizationStatus | null {
const target = this.targets.get(targetId);
if (!target) return null;
const champion = target.variants.find(v => v.status === 'champion');
const testing = target.variants.filter(v => v.status === 'testing');
return {
targetId,
champion: champion ? {
id: champion.id,
content: champion.content,
conversionRate: champion.impressions > 0
? champion.conversions / champion.impressions
: 0
} : null,
activeTests: testing.map(v => ({
id: v.id,
content: v.content,
impressions: v.impressions,
conversionRate: v.impressions > 0
? v.conversions / v.impressions
: 0,
principle: v.principle
})),
totalTestsRun: target.history.length,
cumulativeLift: this.calculateCumulativeLift(target.history)
};
}
private calculateCumulativeLift(history: TestResult[]): number {
if (history.length === 0) return 0;
// Calculate compound lift from all winning tests
const winners = history.filter(r => r.isWinner);
return winners.reduce((compound, result) => {
return compound * (1 + result.lift / 100);
}, 1) - 1;
}
}
interface OptimizationStatus {
targetId: string;
champion: { id: string; content: string; conversionRate: number } | null;
activeTests: Array<{
id: string;
content: string;
impressions: number;
conversionRate: number;
principle: string;
}>;
totalTestsRun: number;
cumulativeLift: number;
}
React Integration for A/B Testing
Here's a complete React implementation that makes it easy to add automated A/B testing to any component:
// ab-testing-react.tsx
import React, { createContext, useContext, useEffect, useState, useCallback } from 'react';
interface ABTestContextValue {
selectVariant: (targetId: string) => ActiveVariant | null;
recordConversion: (targetId: string, variantId: string, revenue?: number) => void;
getStatus: (targetId: string) => OptimizationStatus | null;
}
const ABTestContext = createContext(null);
export function ABTestProvider({
children,
optimizer
}: {
children: React.ReactNode;
optimizer: ContinuousOptimizer;
}) {
const value: ABTestContextValue = {
selectVariant: (targetId) => optimizer.selectVariant(targetId),
recordConversion: (targetId, variantId, revenue) => {
optimizer.recordOutcome(targetId, variantId, true, revenue);
},
getStatus: (targetId) => optimizer.getStatus(targetId)
};
return (
{children}
);
}
// Hook for using A/B tests
export function useABTest(targetId: string) {
const context = useContext(ABTestContext);
if (!context) {
throw new Error('useABTest must be used within ABTestProvider');
}
const [variant, setVariant] = useState(null);
const [hasRecordedImpression, setHasRecordedImpression] = useState(false);
useEffect(() => {
const selected = context.selectVariant(targetId);
setVariant(selected);
// Record impression (non-conversion view)
if (selected && !hasRecordedImpression) {
// The selectVariant already records impression in bandit
setHasRecordedImpression(true);
}
}, [targetId, context, hasRecordedImpression]);
const recordConversion = useCallback((revenue?: number) => {
if (variant) {
context.recordConversion(targetId, variant.id, revenue);
}
}, [context, targetId, variant]);
return {
variant,
content: variant?.content || '',
variantId: variant?.id || '',
recordConversion,
isLoading: variant === null
};
}
// Component wrapper for A/B tested content
export function ABTest({
targetId,
fallback,
onConversion,
children
}: {
targetId: string;
fallback: React.ReactNode;
onConversion?: () => void;
children: (props: { content: string; recordConversion: (revenue?: number) => void }) => React.ReactNode;
}) {
const { content, recordConversion, isLoading } = useABTest(targetId);
const handleConversion = useCallback((revenue?: number) => {
recordConversion(revenue);
onConversion?.();
}, [recordConversion, onConversion]);
if (isLoading) {
return <>{fallback}>;
}
return <>{children({ content, recordConversion: handleConversion })}>;
}
// Example usage: A/B tested CTA button
function CTAButton() {
return (
Get Started}
>
{({ content, recordConversion }) => (
)}
);
}
// A/B tested pricing display
function PricingCard({ price, plan }: { price: number; plan: string }) {
const { variant, recordConversion } = useABTest(`pricing-${plan}`);
const handleSubscribe = () => {
recordConversion(price); // Record with revenue
// Process subscription...
};
// Variant might control price framing
const displayPrice = variant?.content || `$${price}/month`;
return (
{plan}
{displayPrice}
);
}
// Dashboard component for monitoring tests
function ABTestDashboard({ targetIds }: { targetIds: string[] }) {
const context = useContext(ABTestContext);
if (!context) return null;
return (
A/B Test Dashboard
{targetIds.map(id => {
const status = context.getStatus(id);
if (!status) return null;
return (
{id}
Champion: "{status.champion?.content}"
{(status.champion?.conversionRate || 0 * 100).toFixed(2)}%
Active Tests ({status.activeTests.length}):
{status.activeTests.map(test => (
-
"{test.content}"
({test.impressions} impressions,
{(test.conversionRate * 100).toFixed(2)}%)
- {test.principle}
))}
Total tests: {status.totalTestsRun}
Cumulative lift: +{(status.cumulativeLift * 100).toFixed(1)}%
);
})}
);
}
Avoiding P-Hacking and Common Pitfalls
P-hacking occurs when analysts manipulate data or analysis to achieve statistical significance. According to research published in Nature, this is one of the biggest threats to valid experimentation. Here's how to prevent it:
// experiment-guardrails.ts
interface ExperimentGuardrails {
preRegistration: PreRegistration;
sampleSizeCalculation: SampleSizeCalc;
stoppingRules: StoppingRules;
multipleTestingCorrection: MTCorrection;
}
interface PreRegistration {
hypothesis: string;
primaryMetric: string;
secondaryMetrics: string[];
expectedEffectSize: number;
plannedSampleSize: number;
analysisMethod: string;
registeredAt: Date;
// Cannot be changed after experiment starts
locked: boolean;
}
interface StoppingRules {
minimumDuration: number; // days
minimumSamplesPerVariant: number;
maximumDuration: number;
earlyStoppingAllowed: boolean;
earlyStoppingMethod: 'sequential' | 'bayesian' | 'none';
}
class ExperimentValidator {
validateExperiment(
registration: PreRegistration,
currentData: ExperimentData,
stoppingRules: StoppingRules
): ValidationResult {
const issues: string[] = [];
const warnings: string[] = [];
// Check if experiment is locked
if (!registration.locked) {
issues.push('Experiment not pre-registered. Analysis may be biased.');
}
// Check minimum sample size
const minSamples = stoppingRules.minimumSamplesPerVariant;
if (currentData.control.visitors < minSamples ||
currentData.treatment.visitors < minSamples) {
issues.push(
`Insufficient sample size. Need ${minSamples} per variant, ` +
`have ${Math.min(currentData.control.visitors, currentData.treatment.visitors)}.`
);
}
// Check for sample ratio mismatch (SRM)
const srm = this.checkSampleRatioMismatch(currentData);
if (srm.hasMismatch) {
issues.push(
`Sample ratio mismatch detected (p=${srm.pValue.toFixed(4)}). ` +
`Expected 50/50, got ${srm.actualRatio}. Results may be invalid.`
);
}
// Check for novelty effect
const noveltyCheck = this.checkNoveltyEffect(currentData);
if (noveltyCheck.detected) {
warnings.push(
'Potential novelty effect detected. Treatment performance is declining over time.'
);
}
// Check for day-of-week effects
if (this.hasInsufficientWeekCoverage(currentData)) {
warnings.push(
'Experiment has not run for a full week. Day-of-week effects may bias results.'
);
}
return {
isValid: issues.length === 0,
issues,
warnings,
canDeclareWinner: issues.length === 0 && warnings.length === 0
};
}
private checkSampleRatioMismatch(
data: ExperimentData
): { hasMismatch: boolean; pValue: number; actualRatio: string } {
const total = data.control.visitors + data.treatment.visitors;
const expected = total / 2;
// Chi-square test for equal split
const chiSquare =
Math.pow(data.control.visitors - expected, 2) / expected +
Math.pow(data.treatment.visitors - expected, 2) / expected;
// Chi-square with 1 degree of freedom
const pValue = 1 - this.chiSquareCDF(chiSquare, 1);
const ratio = `${((data.control.visitors / total) * 100).toFixed(1)}/${((data.treatment.visitors / total) * 100).toFixed(1)}`;
return {
hasMismatch: pValue < 0.001, // Very strict threshold
pValue,
actualRatio: ratio
};
}
private checkNoveltyEffect(data: ExperimentData): { detected: boolean } {
// Would need time-series data to properly detect
// This is a simplified placeholder
return { detected: false };
}
private hasInsufficientWeekCoverage(data: ExperimentData): boolean {
// Check if experiment has data from all 7 days
// Would need daily breakdown to implement properly
return false;
}
private chiSquareCDF(x: number, df: number): number {
// Simplified chi-square CDF for df=1
if (x <= 0) return 0;
return this.erf(Math.sqrt(x / 2));
}
private erf(x: number): number {
const a1 = 0.254829592;
const a2 = -0.284496736;
const a3 = 1.421413741;
const a4 = -1.453152027;
const a5 = 1.061405429;
const p = 0.3275911;
const sign = x < 0 ? -1 : 1;
x = Math.abs(x);
const t = 1.0 / (1.0 + p * x);
const y = 1.0 - (((((a5 * t + a4) * t) + a3) * t + a2) * t + a1) * t * Math.exp(-x * x);
return sign * y;
}
// Bonferroni correction for multiple testing
applyMultipleTestingCorrection(
pValues: number[],
method: 'bonferroni' | 'holm' | 'fdr'
): number[] {
const n = pValues.length;
switch (method) {
case 'bonferroni':
return pValues.map(p => Math.min(1, p * n));
case 'holm':
const sorted = pValues
.map((p, i) => ({ p, i }))
.sort((a, b) => a.p - b.p);
let maxAdjusted = 0;
const adjusted = new Array(n);
sorted.forEach(({ p, i }, rank) => {
const adjustedP = Math.min(1, p * (n - rank));
maxAdjusted = Math.max(maxAdjusted, adjustedP);
adjusted[i] = maxAdjusted;
});
return adjusted;
case 'fdr': // Benjamini-Hochberg
const sortedFdr = pValues
.map((p, i) => ({ p, i }))
.sort((a, b) => b.p - a.p);
let minAdjusted = 1;
const adjustedFdr = new Array(n);
sortedFdr.forEach(({ p, i }, reverseRank) => {
const rank = n - reverseRank;
const adjustedP = Math.min(1, p * n / rank);
minAdjusted = Math.min(minAdjusted, adjustedP);
adjustedFdr[i] = minAdjusted;
});
return adjustedFdr;
default:
return pValues;
}
}
}
interface ValidationResult {
isValid: boolean;
issues: string[];
warnings: string[];
canDeclareWinner: boolean;
}
Key Takeaways
Remember These Points
- Use multi-armed bandits: Thompson Sampling reduces opportunity cost by 30-50% compared to traditional 50/50 splits
- Automate variant generation: AI can explore design spaces beyond human imagination with GPT-4 generated variants
- Calculate significance correctly: Use sequential testing methods to avoid the peeking problem
- Implement guardrails: Pre-registration, sample ratio mismatch detection, and multiple testing corrections prevent p-hacking
- Build continuous optimization loops: Let the system automatically promote winners and generate new challengers
- Track cumulative lift: Measure the compound improvement from all winning tests over time
- Avoid novelty effects: Run tests for at least one full week to account for day-of-week variations
Conclusion
AI-powered A/B testing represents a paradigm shift from manual, periodic optimization to continuous, self-improving systems. By implementing multi-armed bandit algorithms, you eliminate the opportunity cost of showing underperforming variants. By using AI for variant generation, you explore optimization opportunities humans would never consider. And by building proper statistical guardrails, you ensure your wins are real.
The companies achieving 20%+ conversion improvements aren't running more tests; they're running smarter tests with AI. For deeper exploration, study the Thompson Sampling research, explore Optimizely's bandit documentation, and consider tools like Statsig or GrowthBook for production implementations.
Start with a single high-impact component, perhaps your homepage CTA or pricing page headline. Implement Thompson Sampling, generate five AI variants, and let the system optimize. Within weeks, you'll see why traditional A/B testing is becoming obsolete.