What is Non-parametric Tests? Types: Sign, Rank Correlation, Rank Sum, Matched Pairs, Chi-square Test

Udacity Offer 50 OFF

What is Non-parametric Tests?

Non-parametric tests are not based on assumptions about a population and its parameters. A researcher can use these tests without taking into consideration population distribution and sample type. Non-parametric tests are also known as distribution-free tests because they do not assume that the given data follows a specific distribution. These tests are mainly used when the test model does not specify any stringent conditions regarding the population parameters from which a sample is drawn.

Let us understand the reason behind choosing a non-parametric test over a parametric test with the help of a simple example. Suppose, a researcher wants to find out the preference of customers about the different brands of toothpaste available in the market.

He/ She would ask customers to rank different brands according to their preferences. The data collected would be in the rank form on which parametric tests cannot be performed? This is because a parametric test requires numeric values, such as mean and variance, to test a hypothesis. Therefore, in this case, the researcher would use a non-parametric test.


Types of Non-parametric Tests

The different types of non-parametric tests are:

Sign Test

Sign test is considered one of the easiest non-parametric tests because it takes into account only the plus and minus signs of observations in a sample. It does not consider the magnitude of observations while analysing the data present in a sample. Sign test can be used in place of some parametric tests, such as one sample t-test and paired t-test. It uses binomial distribution to test the validity of a hypothesis. There are two types of sign tests.

One Sample Sign Test

One sample sign test is applied on a sample where the researcher does not assume that the data is normally distributed. In this test, the probability of getting a sample value of less or greater than median value is equal. This implies that the proportion of success (p) and failure (q) is equal which means that p = q = 0.50. Therefore, it is called binomial sign test.

In one sample sign test, the researcher provides sample values with positive (+) and negative (–) signs to test the hypothesis. Here, the researchers usually tests the null hypothesis: M = M0 against an appropriate alternate hypothesis.

Here, three types of tests are possible:

Null HypothesisAlternate HypothesisType of Test
H0: M = M0H1: M > M0Right-tailed test
H0: M = M0H1: M < M0Left-tailed test
H0: M = M0H1: M ≠M0Two-tailed test

In any given sign test, each data value or observation is converted into a plus sign or minus sign. The allocation of + and – signs is done by assuming a median value of the sample. Values that are greater than the median value are replaced by plus sign and the values that are less than the median value is replaced by minus sign. The values that are equal to the given median value are discarded or not considered. After assigning the signs, the researcher may test the null hypothesis that the probability of getting plus and minus signs are 0.5.

The sign test can be performed by using two methods as follows:

  • When the sample size is small, the test is carried out by calculating the binomial probabilities using the binomial probabilities table.

  • When np ≥ 5 and nq ≥ 5; then, normal distribution can be used as an approximation of binomial distribution.

Two Sample Sign Test

In two sample sign tests, the researcher tests two related samples. This test is equivalent to paired-t test. Researchers use sign test when data is given as pairs. In this test, the researcher provides positive (+) and negative (–) signs to values. These signs are allocated on the basis of the difference between the values of first sample and second sample.

If the difference is positive, the difference value gets a plus (+) sign and if the difference is negative, the difference value gets a minus (–) sign. If the values of two samples are equal, these values are discarded.

Thereafter, the researcher calculates the total plus and minus signs and divides the number by the sample size. Then, standard error is calculated and limits are determined. Finally, the hypothesis is tested against the calculated value of limit.

Wilcoxon Matched Pairs Test/signed Rank Test

The Wilcoxon matched pairs tests/Signed rank test is a combination of sign and rank tests and is used to compare a paired sample. It is used in place of paired t-test when the distribution is not normal. The Wilcoxon matched pairs test is used when the researcher wants to determine the direction and magnitude of difference in the matched values. Steps to perform the test are mentioned below:

  • Determine the difference (di) among observed values.

  • Rank the difference |di| in the ascending order (lowest to highest). If the difference between two values is zero, the researcher needs to ignore those values.

  • Segregate the ranks according to the positive and negative signs of di values.

  • Add the ranks with negative and positive signs separately.

  • Determine the T-value by comparing the sums of ranks with neg- ative signs and positive signs. If the sum of ranks with positive sign is more than the sum of ranks with negative sign, the T-value would be equal to sum of ranks of negative sign or vice versa.

Mean is calculated using the following formula:

Mean, µT = n (n + 1)/4

SD is calculated using the following formula:

Standard deviation, σT = √n (n + 1) (2n + 1)/24

Where, n = number of observations – number of ignored observations The test statistic z can be calculated as follows:

Z = T − µT / σ T

If the calculated z-value lies under the limits of acceptance region, the null hypothesis is accepted and the alternate hypothesis is rejected.

Rank Correlation

Rank correlation, also known as Spearman’s rank correlation coeffi- cient, is used to establish correlation between two data sets that can be ranked. Steps to calculate rank correlation are mentioned as follows:

  • Assign ranks to all observations present in two data sets in the descending order. If two or more values in the data sets are iden- tical, calculate mean rank and allocate it to all identical values.


    For example, if third, fourth and fifth ranks have the same value, take out their mean (3 + 4 + 5)/3 = 4 and allocate it as rank to those values.

  • Calculate the difference between ranks by subtracting the rank of one data set from that of second data set. The difference is denoted as di.

  • Calculate the square of di.

  • Find the sum of square of di.

  • Calculate Spearman’s rank correlation coefficient by using the following formula:
Rank Correlation

The value of Spearman’s rank correlation coefficient lies between +1 and –1, where +1 indicates perfect positive correlation and –1 indicates perfect negative correlation. The values that lie between +1 and –1 show different degrees of correlation. The researcher can assess the value of the rank correlation coefficient by performing a hypothesis test.

If the sample size is less than 30, the researcher needs to use the tabulated value of Spearman’s rank correlation coefficient to test the value of coefficient. Suppose, the sample size (n) = 15 and σr = 0.6364, which shows a reasonably high degree of correlation between two data sets. The researcher wants to check the value of σr (rank correlation coefficient) to judge whether the correlation is actually present or not.

He/She forms a null hypothesis that there is no correlation between the two data sets and tests it at 5% level of significance using two-tailed test. The researcher checks the critical value for ρ in the table showing values of Spearman’s rank correlation coefficient. The critical value of ρ is – 0.5179 (lower limit) and + 0.5179 (upper limit).

The given value of ρ = 0.6364 is outside the acceptance region; therefore, the researcher rejects the null hypothesis and concludes that there is a correlation between two data sets.

Rank Sum Test

Rank sum test is used to analyse ordinal data (data in the rank form) and calculate the value of rank sum. First, observations of different samples are arranged in the ascending order of value. Thereafter, these observations are ranked and the sum of ranked observations is calculated. Finally, the sum is tested against the specified test statistic value to test the hypothesis.

Mann-whitney Test or U Test

The Mann-Whitney test (or U test) is used to determine whether two independent samples are drawn from the same population. The test is applied in general conditions and does not have any specific require- ment. The only requirement of the test is that population should be continuous. However, failure to fulfil this requirement does not have a huge impact on the result.

In the Mann-Whitney test, first two sam- ples are merged in increasing or decreasing order. After that, the data in the merged sample is ranked from lowest to highest. After rank allocation, the ranks are classified as R1 for sample 1 and R2 for sample 2. After that, the total of ranks in R1 and R2 is determined.

Finally, the U test is applied in the following manner:

Mann-whitney Test or U Test

Where, U = smaller of U1 and U2 and
n1= sample size of sample 1 and
n2 = sample size of sample 2
R1= sum of the ranks of sample 1 and
R2= sum of the ranks of sample 2

The mean and SD are determined to calculate the limits of acceptance region. The mean could be calculated with the help of following formula:

μU = n1 n2/2

Where, n1= sample size of sample 1
n2 = sample size of sample 2

The formula for standard deviation is as follows:

formula for standard deviation

If the value of U test lies under the limits of the acceptance region, the null hypothesis is accepted. However, if the calculated U value lies out- side the limits of the acceptance region, the null hypothesis is rejected and the alternate hypothesis is accepted. Let us take an example to understand the application of the Mann-Whitney test.

Kruskal-wallis Test

The Kruskal-Wallis test is equivalent to one-way ANOVA (explained later in this chapter) with only one difference that the former is based on ranks while the latter is based on numerical values. The test is an extension of the Mann-Whitney U-test. In the Kruskal-Wallis test, the samples must be more than two, whereas, samples are two in the Mann-Whitney U-test.

The Kruskal-Wallis test is used to determine whether samples in a study are taken from the same population. In the test, the data from different samples is merged and values are ranked in any order (low to high or high to low). Ranks are classified as R1,R2….and R n, according to the samples to which they belong.

The test is performed in the following manner:

Where, n = sample size

Ri= sum of the ranks of all the samples separately that is R1, R2, and……………. , ni = n1, n2, n3,……………

Chi-square value is determined at d.f. k–1 and the specified level of significance and the calculated H value is tested against it. If the H value lies under the limits of acceptance region, the researcher accepts the null hypothesis and rejects the alternate hypothesis. However, if the H value lies outside the limits of acceptance region, the researcher rejects the null hypothesis and accepts the alternate hypothesis.

Chi-square Test

Chi-square test is used to find out dependency between two attributes. It can also be used to make comparisons between theoretical population (expected data) and actual data (observed data). The formula used in the chi-square test is as follows:

Chi-square Test

Where, Oi = Observed frequency
Ei = Expected frequency

Expected frequency can be calculated with the help of the following formula:

Ei = Row total × Column total/Grand total (For test of independence)

If the value of chi-square is greater than critical value of χ2, null hypothesis is rejected.

Chi-square Test for Goodness of Fit

The test helps the researcher know whether the theoretical distribution (distribution of expected frequency) is fitted to the observed data and to what extent. In chi-square test, first the researcher finds out expected frequency on the basis of distribution.

Thereafter, he/she calculates chi-square value with the formula used to calculate chi-square. In chi-square test, d.f. used is n–1. Chi-square value is determined at the specified level of significance and d.f. If the calculated chi-square value lies under the limits of the acceptance region, the researcher accepts the null hypothesis and rejects the alternate hypothesis.

Chi-square Test for Independence

In chi-square test for independence, two attributes are tested to find out whether they are associated with each other.

For example, the researcher wants to know that the introduction of better/unique services helps increase the sales of an organisation or not. In this case, the researcher is trying to establish a relation between two attributes − better services and sales.

In chi-square test, first expected frequency is calculated and then the value of chi-square is ascertained. The d.f. used in this case is (r–1) (c–1), where r equals number of levels for one category of variable and c equals number of levels for second category of variable. The chi-square value is determined at the specified level of significance and d.f. If the calculated chi-square value lies under the limits of acceptance region, the null hypothesis is accepted and the alternate hypothesis is rejected.


Multivariate Techniques and Its Uses

Multivariate indicates that numerous dependent variables are combined to produce a single result. This explains why the vast majority of real-world situations are multivariate. We can’t predict the weather of any year based on the season, for example. Pollution, humidity, precipitation, and other factors all play a role.

Multivariate analysis, in its most basic form, is a statistical tool for determining the associations between two or more response variables. Multivariate approaches seek to describe reality in which each circumstance, product, or choice is influenced by multiple variables. When buying an automobile, for example, price, safety features, colour, and functionality may all be taken into account.

Uses of Multivariate Techniques

In consumer and market research, quality control and quality assurance, process optimisation and process control, and research and development, multivariate approaches are used to study data sets. These methods are especially significant in social science research because social scientists are unable to conduct randomised laboratory experiments like those used in medicine and the natural sciences.

Multivariate approaches can be used to statistically evaluate associations between different variables, as well as to correlate how essential each one is to the final outcome and where dependencies exist.


Business Ethics

(Click on Topic to Read)

Corporate social responsibility (CSR)

Lean Six Sigma

Research Methodology

Management

Operations Research

Operation Management

Service Operations Management

Procurement Management

Strategic Management

Supply Chain

Leave a Reply