
What
 the practice of showing two variants of the same web page to different segments of visitors at the same time and comparing which variant drives more conversions

A/B testing is a general methodology used online when testing product changes and new features
 A/B testing works best when testing incremental changes
 A/B testing doesn’t work well when testing major changes, like new products, new branding or completely new user experiences
 the basis for Data Driven Development

2 user experiences with random distribution of users
 randomness averages all other factors
 allows you to check the difference in user experience by one indicator

What to test

Attraction Marketing
 channels

Product solutions

think about the concept of product value
 think about ROI

technical solutions
 find strange scenarios
 understand the value of refactoring
 business model

Why
 to expand your business by acquiring new customers and build relationships by catering to existing ones
 Solve Visitor Pain Points
 Get Better ROI from Existing Traffic
 Reduce Bounce Rate
 Make Lowrisk Modifications
 Achieve Statistically Significant Improvements
 Profitably Redesign your Website
 DataDD

Mistakes

Not Planning your Optimization Roadmap
 Invalid hypothesis
 Testing too Many Elements Together
 Ignoring Statistical Significance
 Using Unbalanced Traffic
 Testing for Incorrect Duration
 Failing to Follow an Iterative Process
 Using the Wrong Tools

Testing is carried out for a company that has not reached the required level of DDD
 tests are ineffective

"+"/ "

And what else

learn more about our users
 to make strategic decisions
 insures against accidental product improvements

more flexible project infrastructure for release management:: quick release of changes
 Teams know the main metrics
 Team is in touch of changes and is included in the process
 formulation of hypotheses when setting tasks for the team, understanding the metrics, linking to the company's goals: understanding the relevance of the business goal

and what is bad
 DDD degrades the design
 it is very difficult to execute the test and it is easy to mistake the conclusions

cool idea, but does not affect the business
 very disappointing
 long, expensive, tiring

local optimum trap
 a big step is very expensive
 focus on quick and shortterm understandable goals, no focus on longterm

Maths

general population and sample

general population
 all objects that interest us

sample
 based on the results, we try to draw conclusions for the general population

statistical methods

sample studies
 help us make informed decisions based on probabilities

estimation accuracy

confidence interval
 captures a certain amount of probability for a range
 count
 https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
 https://samplesize.net/confidenceintervalproportion/
 https://www.calculator.net/confidenceintervalcalculator.html
 c + 1.645 * sqrt (c*(1c)/N)
 c conversion in sample
 1.645  coefficient depending on the level of trust (90%)
 N  number of observations

Result

Stat test

Task

determine the probability that the difference in results is due to product properties rather than random
 mistakes
 true negative
 no difference
 false negative
 Type II error: the test showed that there is no difference, but it is
 Experiment power (sensitivity)
 depends on the effect that is actually
 selected confidence level
 sample size
 effect size
 is inversely proportional to the probability of making the Type II error:
 the probability of overlooking the effect depends on the size of the actual effect
 false positive
 Type I error:: the test showed that there is a difference, but it is not
 confidence level
 the probability of making such a mistake
 pvalue
 an estimate of the probability of obtaining the observed value by chance
 the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct
 true positive
 there is a difference

tools
 https://abtestguide.com/calc/
 https://abtestguide.com/abtestsize/

structure

Lets's use H0
 assumption: there is no difference and check if the data does not contradict this assumption
 Null hypothesis
 no difference
 Alternative hypothesis
 samples A and B are taken from populations with different distributions

options
 Gtest
 XI ^2
 Student's T test
 for binary values
 for continuous variables

triangle
 confidence level and power level
 sample size
 registrable effect
 Subtopic 4

find a balance
 for a more sensitive experiment we increase the sample size
 if you reduce the number of observations, then the minimum effect will be greater

Let's Test

Research
 How the website is currently performing
 use Heatmap tools
 quantitative and qualitative research
 Observe and Formulate Hypothesis

Define Typical Values

Confidence level

90%
 Type I error: there is an effect, but in fact it is not

Power

80 %
 Type II error

experiment duration
 round up to whole weeks

Hypothesis type
 Onesided

Create samples

Let's calculate the size of the required sample
 https://abtestguide.com/abtestsize/
 find out the duration of the experiment

Run Test

Split URL Testing

How
 Split URL Testing is testing multiple versions of your webpage hosted on different URLs
 to compare two versions of a product
 to find out how changes in your product have affected its use: to compare the key product metrics for each version

Strategy
 Setting up pages for the Split URL test
 Adding conversion goals and estimating test duration
 Finalizing the test Previewing and starting the test
 Previewing and starting the test

Multivariate Testing (MVT)
 changes are made to multiple sections of a webpage, and variations are created for all the possible combinations

Multipage Testing
 to test changes to particular elements across multiple pages

Let's test our results
 https://abtestguide.com/calc/

estimate pvalue
 https://abtestguide.com/calc/

use tools

calcs

AB testguide
 https://abtestguide.com/calc/

GTM testing
 Google Tag Manager
 https://abtestguide.com/gtmtesting/

Bayesian A/Btest Calculator
 https://abtestguide.com/bayesian/

Optimizely
 https://www.optimizely.com/samplesizecalculator/

PlodCalc
 https://prodcalc.app/?fbclid=IwAR23UeOp1zau_itFWUGxehyG_saTaLTykTppnDsaYgwTXMwp6o33LGAqmiw

A/B Split & Multivariate Test Duration Calculator
 https://vwo.com/tools/abtestdurationcalculator/

CLT for means
 https://gallery.shinyapps.io/CLT_mean/

Normal Table  z Table  Standard Normal Table  Normal Distribution Table
 http://www.normaltable.com/ztablerighttailed.html

Distribution Calculator
 https://gallery.shinyapps.io/dist_calc/

Sample Size Calculator (Evan’s Awesome A/B Tools)
 https://www.evanmiller.org/abtesting/samplesize.html

CIprocess
 Measure

Prioritize

CIE Prioritization Framework
 Confidence
 Importance
 Ease
 A/B test
 Repeat

We are working in changing and unpredictable environment

our changes in product
 better
 worse

hypothesis testing (typical)

User's feedback
 feedback is not always truthful and relevant

Sampling bias
 statistics of feature using
 biases
 not difference between correlation and causation
 survivor's mistake

Comparisons of events in time
 the product is influenced by many factors
 changes in competitors
 technical features, new technologies
 the product has become faster / slower
 seasonal demand
 pure chance

Evolutionary distortion
 we see patterns where there are none
 we see factors that confirm our correctness and do not see others

Sources
 https://vc.ru/flood/6371aberrors
 https://vwo.com/abtesting/
 https://blog.hubspot.com/marketing/howtodoabtesting
 https://www.crazyegg.com/blog/abtesting/
 https://medium.com/@robbiegeoghegan/implementingabtestsinpython514e9eb5b3a1
 https://classroom.udacity.com/courses/ud257/lessons/4018018619/concepts/40043986940923

my sketch
 https://twitter.com/ManukhinaDarya/status/1295284820365520896?s=20