In this lesson, we’ll use real-life examples and charts to learn about restriction of range, a statistical technique in which only part of the data available is used to find the connection between two variables or quantities.
What Is Restriction of Range?
Just for a minute, imagine that you’re a school administrator looking to find the grade point average (GPA) of the Class of 2015 at Central High School in Somewhere, U.S.A.
However, instead of calculating the mean (or average) GPA of the entire graduating class, you calculated the mean GPA for the 25 students in the senior English honors class.In this scenario, you’ve restricted the range. In the field of statistics, restricting the range means to limit the data in the population to some criterion, or use a subset of data to determine whether two pieces of information are correlated, or connected.
Let’s look at another example. Let’s say you’re interested in the exercise habits of young adult Americans but only poll college students with an athletic scholarship.In both examples, you used data from a sample population whose results would differ from those found in the general population. As such, your results will not satisfy the original inquiry: students in an English Honor class will most likely have higher grades than the average of the entire student body, while college students on athletic scholarships will most likely spend more time exercising.
The Correlation Coefficient
When you plot a set of data on an xy grid, the correlation coefficient is a number between -1 and 1 that tells you how likely the data is to form a line, or have a linear relationship.
A set of data with a correlation coefficient of zero means the two variables do not share a linear relationship. However, the correlation gets stronger as the coefficient gets closer to 1 and -1, which means the data forms a line. When you restrict the range of a data set, it causes the correlation coefficient to move toward zero.This chart shows how 21 students did on a test in relation to how many minutes they spent studying.
If you enter the data set into a graphing calculator and ask it to run a linear regression, or relationship model, with a correlation coefficient, your result will be .
746. According to the results, test scores and time studying are linearly correlated. In everyday language, the more time the 21 students spent studying, the higher they scored on the test.
Now, let’s take a look at another chart that includes the same information found on the first chart but uses a restricted range.
In this chart, students who studied very little and those who studied a lot have been set aside. Using only the data in the yellow section, we can run the same linear regression while restricting the range to students who studied between 30 and 60 minutes. This time, the correlation coefficient is only .
509, a much weaker indicator that the data is linearly correlated.
Why Restricting the Range Is Helpful
There are times when restricting the range of a set of data can be useful. For example, in many sets of data the x and y variables do not follow a linear correlation.
Graphs of other correlations, such as cubic, exponential, quadratic or higher degree polynomial functions, do not form straight lines. A correlation coefficient for a non-linear data set will always result in correlation coefficients that are close to zero, which you can address by restricting the range. While the data may not be linear in its entirety, you can examine the portion that comes closest to forming a straight line.
In statistics, restricting the range is a process by which you limit your data set to a subset of the data. You can use it to see if two sets of information or variables are connected, or correlated.
However, this technique usually weakens the validity of your conclusions because it does not take into account all of the available information.If the correlation coefficient of a data set on an xy grid is a number between 1 and -1, your information or variables have a linear relationship. A correlation coefficient of zero means your data sets or variables do not have a linear relationship. When data is predominantly non-linear, restricting the range can be helpful and provide a strong correlation for a portion of the data.