Measures of Relationship
The measures of relationship study the relationship between two or more variables in a given data series. When you study the relationship between two variables in a population, it is known as bivariate population. When you study more than two variables in a population, it is known as multivariate population. The relationship among variables can be of two types – correlation and cause and effect. Based on these relationships, there are two types of analysis namely Correlation Analysis and Regression Analysis.
Table of Content
Type of Relationship
Correlation analysis is used to study the association between different types of variables. It measures the extent to which one variable is linearly related to the other variables. Different tools are used to study the correlation pattern between variables. These include: Rank correlation and Simple correlation. Let us discuss each tool.
Rank correlation refers to the correlation between two data series in which the data is ranked. Generally, it is found when the data is qualitative in nature. It was given by Charles Spearman. Therefore, it is also known as Spearman’s coefficient of correlation. It calculates the degree of relationship between two types of variables.
The formula to calculate rank correlation is as follows:
Where, di = Difference between the individual/ith pair of variables
n = Number of pairs of observations
Simple correlation is used to find the degree of linear relationship between two variables. It is the most commonly used measure to describe relationship between two linearly related variables. It was given by Karl Pearson. Therefore, it is also known as Karl Pearson’s coefficient of correlation.
Simple correlation can be of three types, as given in Figure:
The strength of association between two variables depends on the calculated value of the correlation coefficient and the sample size. The value of the correlation coefficient lies between a range of –1 and +1.
- If the value of the correlation coefficient is close to –1 and the sample size is sufficiently large, then there is a strong negative correlation between two variables. For example, if the coefficient of correlation is –0.8, then there is a strong negative association between variables.
- If the value of the correlation coefficient is close to +1 and the sample size is sufficiently large, then there is a strong positive correlation between two variables. For example, if the coefficient of correlation is 0.8, then there is a strong positive association between variables.
- If the correlation coefficient is not close to –1 or + 1 and the sample size is sufficiently large, then there is weak correlation between two variables. For example, if the coefficient of correlation is 0.3 or –0.3, then the association between variables is weak.
The formula used to calculate simple correlation is as follows:
Where, Xi = ith value of X variable
X = Mean of X variable
Yi = ith value of Y variable
Y = Mean of Y variable
n = Number of pairs of observations
Sx = Standard deviation of X
Sy = Standard deviation of Y
Let us learn to calculate simple correlation between two variables with the help of an example. Suppose you want to study the correlation between the age and weight of a group of people to find out the relation between the two.
Table shows the data:
|Number of Observations||Age (Xi)||Weight(Yi)||Xi2||Yi2||XiYi|
The calculation of correlation is as follows:
r = (25 × 40846 – 698 × 1316)/√ (25 × 22458 – 698 × 698) (25 × 75464 – 1316 × 1316)
r = 102582/√74246 × 154744
r = 0.96
Correlation need not necessarily imply causality. But it can be said that if correlation between any two variables is very high, then it might be indicative of causality, i.e., a situation where one variable denotes the cause and the other variable denotes its effect. For example, if X and Y are correlated, the causal relationship inferred from correlation between them may indicate that, X is a cause of Y, Y is a cause of X, or both X and Y are caused by some other variable Z, etc.
Correlations are employed through methods such as regression analysis. In order to understand In common parlance, regression analysis (whether simple or multiple) is also termed as causal analysis. Causality between different variables can be understood using causal analysis. Cause and effect analysis is measured using simple regression or multiple regression. Regression is one step ahead of correlation in identification of relationship between two variables.
This is because regression allows for prediction of values within the given data range. In simple language, if we know X, we can predict Y and if we know Y, you can predict X. This is possible with the help of an equation called regression equation. The variable Y is generally termed as dependent or criterion variable and the variable X is termed as independent or predictor variable.
Regression equation is used to generally predict the values of Y based on the values of X. However, it cannot be rightly said that Y is caused by X. Before making such an interpretation, it is extremely imperative for the researcher to thoroughly understand the variables under study and the circumstances or context under which they operate.
The regression equation can be written as below:
Y = α + βX
Y represents scores on Y variable
X represents scores on X variable
α represents regression constant in the sample
β represents regression coefficient in the sample
α and β are calculated with the following formula:
Simple regression analysis is useful in a number of situations, for example, it is used in analysing the relationship between number of consumers (independent variable) and product sales of a month (dependent variable). The regression equation to the data is fitted with the use of least squares method in regression analysis.