Hypothesis Testing

Hypothesis testing is a formal process which is used by statisticians to accept or reject a statistical hypothesis. It is a premise or claim that we want to test. The best way to do so would be to consider the full population and analyze it. But, in real world that would not be possible and hence to do that we examine a sample of data and try to explain the hypothesis. If the sample data is not consistent with the hypothesis, we reject the hypothesis.

There are two types of hypothesis –

Null Hypothesis – Null hypothesis, also denoted by Ho is the generally accepted fact. This is what we will be testing. For example, a factory may claim that their water bottles come with average volume of 150 ml of water.
Alternate Hypothesis – Alternate hypothesis, also denoted by Ha is the new hypothesis which someone has stated and we would like to test. For example, someone in the above example claims that bottles contain an average volume of water more than 150 ml.

So, in the above case, we will state the hypothesis as –

Null hypothesis – Ho: Va = 150ml
Alternate Hypothesis – Ha: Va > 150 ml

Now when we take a sample and try to find out the average volume of the water in a bottle, if it goes above 150ml, we can say that we reject the null hypothesis. Or, else if we do not find the volume to go above 150 ml, we can say that we fail to reject the null hypothesis.

Significance Level –

The Significance level is the probability of rejecting the null hypothesis, even if it is true. It is also denoted by Alpha. Typical values used are 0.01, 0.05, 0.1. So, how to decide on significance level. There are two ways this is picked up. One is looking at the type of experiment we are doing. If we pick up a higher Significance level, there are more chances that we wrongfully reject the null hypothesis, but if we expect a more random behavior, then we can take a higher significance level. The other reason people play with significance level is to prove their hypothesis. Once the significance level is increased, the chances that you reject null hypothesis will also increase.

Z test –

To test the null hypothesis, we can perform the Z test. We assume our distribution to be normally distributed and we calculate the z value, which is the number of standard deviations the sample is from population mean.

z = (Sample mean – Population mean)/ Standard Deviation

If the sample mean is very close to Population mean, then z value is zero and we accept the null hypothesis. But then how big the z value should be for us to reject the null hypothesis ? This is where we use the significance level or the Alpha.

If we are doing a one tailed test, then we will take the value of Alpha and find out the value from Z table. Now the value less than this z table value is called the acceptance area and outside it is called rejection area. If we get a value of z greater than the value in z table, we reject the null hypothesis

Type I and Type II errors

There are two types of errors which you can make in a Hypothesis test. The first is the Alpha and second is the Beta.

Alpha – Also known as the level of significance. This is the error when we reject the true null hypothesis and the probability of doing this is the level of significance Alpha.
Beta – The type two error is when we fail to reject the false null hypothesis. Also, rejecting the false null hypothesis is 1-Beta. Also, this is the very goal which we want to achieve and hence we call it Power of the test. We can increase the power of the test by increasing the sample size.

P-value, critical value and Z statistic

Critical value is the value beyond which our rejection region starts and we are able to reject the null hypothesis. Whereas P-value can be defined as the probability of getting more extreme observations when the null hypothesis is true. When we want to check if we want to reject the null hypothesis or not, we calculate the Z value and compare it with the Z value of significance level(for example 5%). So, we need to calculate this again and again to make the comparison. Instead we can calculate the p value for the corresponding z value and compare it directly against the significance level, like 0.05(5%) or 0.01(1%).

Some types of hypothesis tests-

Z Test
T Test – If your sample size is smaller than 30, then you should be doing a T test, else a Z test.
ANOVA Test(F Test) – We can use the Z or T test for 2 samples comparison, but when we want to do compare more than 2 samples, we use the F test.
Chi-Squared Test – When we have two categorical variables from the same sample, we use this test to see if there is any significant association between these two variables.