# Quick Concept: ANOVA

Understand what is anova and how to use it.

# ANOVA

*ANOVA, or Analyss of Variance *is a statistical test used to determine if there is a significant difference between groups.

ANOVAis a statistical test used to determine if at least one group mean is significantly different than the others.

The same way we use a T-Test to check for statistical differences in means, when we have many groups to test, the T-Test option can become overwhelming since there would be many tests to be performed.

ANOVA comes to solve that problem, testing all the groups averages at once.

*Ho: There is no statistically significant difference between the groups averages**Ha: There is statistically significant difference from at least one group average*

# How to Perform in R

In R, we can use the built-in method `aov(target ~ explanatory, data)`

`# Create a dataset`

four_sessions <- data.frame(

Page = c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4),

Time = c(164, 172, 177, 156, 195, 172,178, 191,182,185,177,

185,175,193,171,163,176,176,155,166,164,170,168,162)

)

# ANOVA Test

summary(aov(Time ~Page, data=four_sessions))

## [OUT] ##

Df Sum Sq Mean Sq F value Pr(>F)

Page 3 1093 364.4 4.472 0.0147 *

Residuals 20 1630 81.5

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

If we consider a significance level of 0.05, given that the p-Value < 0.05, we can reject *Ho* and infer that there is a significant difference from at least one group.

Alternatively, there is also the method `aovp()`

from the `lmPerm`

package.

# How to Perform in Python

In Python, we can implement the solution using the `scipy`

module.

The downside is that we must separate the groups ourselves in Python. So, there’s an extra step.

`import pandas as pd`

import scipy.stats as sns

# Creating a dataset

four_sessions = pd.DataFrame({

'Page':[1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4],

'Time':[164, 172, 177, 156, 195, 172,178, 191,182,185,177,

185,175,193,171,163,176,176,155,166,164,170,168,162]

})

# Separate groups

pg1 = four_sessions.groupby('Page').get_group(1).Time

pg2 = four_sessions.groupby('Page').get_group(2).Time

pg3 = four_sessions.groupby('Page').get_group(3).Time

pg4 = four_sessions.groupby('Page').get_group(4).Time

# ANOVA test

sns.f_oneway(pg1, pg2, pg3, pg4)

## [OUT] ##

F_onewayResult(statistic=4.472230745627494, pvalue=0.01471358269967038)

We get the same result. Considering a significance level of 0.05, given that the p-Value < 0.05, we can reject *Ho* and infer that there is a significant difference from at least one group.

# References

BRUCE, Peter et all. 2019. *Paractical Statistics for Data Scientists.* O’Reilly.