A/B Testing
Run experiments with string and JSON flags to measure the impact of changes
Feature flags make it easy to run A/B tests. Create a flag with multiple variations, split traffic between them, and measure which variation performs best.
How it works
- Create a string or JSON flag with two or more variations
- Set the rollout percentage to control how many users enter the experiment
- Users are deterministically bucketed — the same user always sees the same variation
- Track conversions and metrics to determine a winner
Setting up an A/B test
Step 1: Create a string flag
Create a new flag in the dashboard:
- Key:
checkout-layout - Type: String
- Variations:
| Index | Name | Value |
|---|---|---|
| 0 | Control | "control" |
| 1 | Variant A | "single-page" |
| 2 | Variant B | "multi-step" |
Step 2: Set the rollout
Set the rollout percentage to 100% so all users enter the experiment. The SDK's bucketing algorithm (hash(userId + flagKey) % 100) automatically distributes users evenly across variations.
To limit the experiment to a subset of users, lower the rollout percentage. Users outside the rollout will get the off variation.
Step 3: Evaluate in code
const layout = client.getValue('checkout-layout')
switch (layout) {
case 'single-page':
renderSinglePageCheckout()
break
case 'multi-step':
renderMultiStepCheckout()
break
default:
renderCurrentCheckout()
}
import { useFlagValue, Variant } from '@flagpool/react'
function Checkout() {
return (
<>
<Variant flag="checkout-layout" value="single-page">
<SinglePageCheckout />
</Variant>
<Variant flag="checkout-layout" value="multi-step">
<MultiStepCheckout />
</Variant>
<Variant flag="checkout-layout" value="control">
<CurrentCheckout />
</Variant>
</>
)
}
Step 4: Track results
Instrument your analytics to track which variation each user sees, then measure the metrics that matter:
- Primary metric — e.g., conversion rate, revenue per user
- Secondary metrics — e.g., time on page, bounce rate
- Guardrail metrics — e.g., error rate, support tickets
Bucketing algorithm
Flagpool uses a deterministic hash to assign users to variations:
bucket = hash(userId + flagKey) % 100
This means:
- The same user always sees the same variation for a given flag
- Different flags produce different bucketing (no correlation between experiments)
- No database or external state is needed
- Works consistently across all SDKs
Targeting specific segments
You can limit experiments to specific user segments using targeting rules. For example, to run an experiment only for US users on the pro plan:
- Add a rule:
country eq "US"ANDplan eq "pro"→ enter experiment - Everyone else gets the default (control) variation
Ending an experiment
When you've gathered enough data:
- Pick the winner — the variation with the best results
- Set the winning variation as the default — update the flag's default variation to the winner
- Remove the experiment code — clean up the conditional logic and replace it with the winning implementation
- Archive the flag in the dashboard
Best practices
Run experiments long enough
Don't call an experiment too early. Make sure you have enough sample size for statistically significant results.
One change at a time
Each experiment should test one hypothesis. Avoid testing multiple unrelated changes in the same flag.
Use guardrail metrics
Always monitor error rates and user satisfaction alongside your primary metric. A variant that increases conversions but also increases errors is not a winner.
Document experiments
Keep a record of what you tested, the hypothesis, the results, and the decision made. This prevents re-running the same experiment later.
Next steps
- Flag Types — string and JSON flags for experiments
- Rollouts — percentage-based bucketing
- Targeting — segment experiments by audience