A/B Testing

Run experiments with string and JSON flags to measure the impact of changes

Feature flags make it easy to run A/B tests. Create a flag with multiple variations, split traffic between them, and measure which variation performs best.

How it works

Create a string or JSON flag with two or more variations
Set the rollout percentage to control how many users enter the experiment
Users are deterministically bucketed — the same user always sees the same variation
Track conversions and metrics to determine a winner

Setting up an A/B test

Step 1: Create a string flag

Create a new flag in the dashboard:

Key: checkout-layout
Type: String
Variations:

Index	Name	Value
0	Control	`"control"`
1	Variant A	`"single-page"`
2	Variant B	`"multi-step"`

Step 2: Set the rollout

Set the rollout percentage to 100% so all users enter the experiment. The SDK's bucketing algorithm (hash(userId + flagKey) % 100) automatically distributes users evenly across variations.

To limit the experiment to a subset of users, lower the rollout percentage. Users outside the rollout will get the off variation.

Step 3: Evaluate in code

const layout = client.getValue('checkout-layout')

switch (layout) {
  case 'single-page':
    renderSinglePageCheckout()
    break
  case 'multi-step':
    renderMultiStepCheckout()
    break
  default:
    renderCurrentCheckout()
}

import { useFlagValue, Variant } from '@flagpool/react'

function Checkout() {
  return (
    <>
      <Variant flag="checkout-layout" value="single-page">
        <SinglePageCheckout />
      </Variant>
      <Variant flag="checkout-layout" value="multi-step">
        <MultiStepCheckout />
      </Variant>
      <Variant flag="checkout-layout" value="control">
        <CurrentCheckout />
      </Variant>
    </>
  )
}

Step 4: Track results

Instrument your analytics to track which variation each user sees, then measure the metrics that matter:

Primary metric — e.g., conversion rate, revenue per user
Secondary metrics — e.g., time on page, bounce rate
Guardrail metrics — e.g., error rate, support tickets

Bucketing algorithm

Flagpool uses a deterministic hash to assign users to variations:

bucket = hash(userId + flagKey) % 100

This means:

The same user always sees the same variation for a given flag
Different flags produce different bucketing (no correlation between experiments)
No database or external state is needed
Works consistently across all SDKs

Targeting specific segments

You can limit experiments to specific user segments using targeting rules. For example, to run an experiment only for US users on the pro plan:

Add a rule: country eq "US" AND plan eq "pro" → enter experiment
Everyone else gets the default (control) variation

Ending an experiment

When you've gathered enough data:

Pick the winner — the variation with the best results
Set the winning variation as the default — update the flag's default variation to the winner
Remove the experiment code — clean up the conditional logic and replace it with the winning implementation
Archive the flag in the dashboard

Best practices

Run experiments long enough

Don't call an experiment too early. Make sure you have enough sample size for statistically significant results.

One change at a time

Each experiment should test one hypothesis. Avoid testing multiple unrelated changes in the same flag.

Use guardrail metrics

Always monitor error rates and user satisfaction alongside your primary metric. A variant that increases conversions but also increases errors is not a winner.

Document experiments

Keep a record of what you tested, the hypothesis, the results, and the decision made. This prevents re-running the same experiment later.

Next steps

Flag Types — string and JSON flags for experiments
Rollouts — percentage-based bucketing
Targeting — segment experiments by audience

Beta Testing Kill Switches