D1.5 Determine the mean and the median and identify the mode(s), if any, for various data sets involving whole numbers and decimal numbers, and explain what each of these measures indicates about the data.
Skill: Explaining What Measures of Central Tendency Indicate About the Data
Statistical measures, such as the mean, are used to describe a set of data. Statistical measures are presented as part of the fourth step in the inquiry process, interpreting the results, because they are another way of attributing meaning to the data and can provide information upon which to base a decision.
Different statistical measures are commonly used in data management. Those studied in the junior grades are the range, the mode, the median, and the mean. Students need to have a clear understanding of what each represents in order to choose, identify, and use them appropriately.
Source: translated from Guide d’enseignement efficace des mathématiques, de la 4e à la 6e année, Traitement des données et probabilité, p. 107.
Knowledge: Mode
The mode of a set of data represents the value or category having the highest frequency in a data set. The mode is particularly significant in survey settings where it is necessary to determine what is most popular, most sold, most frequent, etc. As the examples below illustrate, it is possible to determine the mode of a quantitative or qualitative set of data.
Example 1
The table below shows the number of children in the families of students in a class. The most frequent number is 2, which indicates that there are more families with two children than any other number of children. The mode of this quantitative set of data is 2 children per family.
Number of Children in the Families of the Students in the Class
Number of Children in the Family | Number of Students |
---|---|
1 | 3 |
2 | 12 |
3 | 6 |
4 | 3 |
more than 4 | 2 |
Example 2
The graph below shows the favourite colours of the students in the class. To determine the mode of this qualitative data, we need to look at the length of the bars. The red and blue bars are of equal length and are longer than all the others. So in this case, there are two modes, red and blue.
image The coloured five bar graph is titled: "Students’ Favorite Colours". The horizontal axis is named "Colours", while the vertical axis is named "Number of Students". The yellow bar goes up to the number 4. The red bar goes up to number 5. The green bar goes up to the number 3. The blue bar rises to the number 5, and the purple bar meaning "Other" rises to the number 4. With arrows, the word "mode" points to the red bar and the blue bar.Example 3
The data below was recorded during a long jump competition.
- 1.04 m 1.06 m 1.12 m 1.13 m 1.16 m 1.19 m 1.22 m 1.28 m 1.36 m
Since all the values are different, there is no value with the highest frequency. This set of data has no mode. However, the values could be grouped into intervals, as in the table below. In such a case, the mode corresponds to the interval with the highest frequency, in other words, the interval from 1.10 m to 1.19 m.
Long Jumps
Length (m) | Number of Students |
---|---|
1.00 to 1.09 | 2 |
1.10 to 1.19 | 4 |
1.20 to 1.29 | 2 |
1.30 to 1.39 | 1 |
When using the mode to answer a question of interest or make a decision, it is important to consider the set of data. In some situations, the most frequent value or category may not be the one that makes the most sense of the data. It is important to encourage students to examine each situation closely before drawing conclusions based on the mode.
The following are examples of situations in which the appropriateness of using mode as a representative data value is evaluated:
- In the previous example 1, the mode of 2 children per family seems to be fairly representative of the situation since there is a significant difference between this frequency and the others.
- In the previous example 2, not only are there two modes (red and blue), but the difference between their numbers and the other frequencies is not very large. It is therefore difficult to conclude that these two modes represent a strong colour preference. In this case, it would be better to mention that red and blue are slightly more popular, but that yellow follows closely behind.
- According to the stem and leaf plot below, the mode corresponds to 72 heartbeats per minute. This number is also part of the 70-79 interval that has the most data. Therefore, it represents this data set well.
- From the line plot below, the mode corresponds to 60 heartbeats per minute. This number is far from the interval that contains most of the data points (69 to 77). Also, the range of the data (29) is large, and each data value appears only once, twice, or three times, so it would be best to not use the mode to draw a conclusion about this set of data.
Source: translated from Guide d’enseignement efficace des mathématiques, de la 4e à la 6e année, Traitement des données et probabilité, p. 108-111.
Knowledge: Mean
In mathematics, the mean has a precise meaning; it corresponds to the value resulting from an equal-share division. For example, if 5 friends have collected $5, $7, $8 and $8 respectively and they pool these amounts to share them equally, each will receive $7. The mean of the amounts collected is therefore equal to $7. In more advanced mathematics, this mean is called the arithmetic mean. Other means exist (for example, geometric mean, harmonic mean ), but they are not studied in the junior grades. Educators should focus on understanding the concept of mean rather than memorizing the usual algorithm (sum of the data divided by the number of data). To do this, they should provide students with activities that use the equal-share model or the balance model between the sum of the shortages (differences between the mean and the data that are less than the mean) and the sum of the surpluses (differences between the mean and the data that are greater than the mean). Otherwise, students gain only a limited understanding of the concept of mean.
Equal Share
The examples below show different situations that use the equal-share model and help develop a good understanding of the concept of mean. The equal-share model can be used to determine a mean without having to use the standard algorithm.
Example 1
Amir, Bruno, Carla, Denis and Elmira went fishing and caught 2, 2, 3, 3 and 10 fish respectively. Determine the mean of the number of fish caught.
To determine the average number of fish caught, students can determine how many fish each person would have if the fish were evenly distributed. They can first illustrate the initial situation as follows.
Then, the students divide the fish: Elmira gives 2 fish to Amir, 2 fish to Bruno, 1 fish to Carla and 1 fish to Denis.
After sharing, each person has 4 fish, so students can conclude that, on average, the 5 friends caught 4 fish each. The mean is 4.
In order to deepen the concept of mean, it is important to give students the opportunity to reverse the process by asking them to create a set of data with a given mean. This reinforces the concept of mean as the result of equal sharing.
Example 2
Six students in a class determined that they had, on average, 5 pens each. What might be a possible distribution of pens among these six students?
Since the 6 students have an average of 5 pens each, each student would have 5 pens after the equal share.
This gives a total of 30 pens (6 x 5). The 6 students can then divide the 30 pens among themselves as they see fit. Regardless of the division chosen, the mean value of 5 (average of five pens per student) will be maintained. Here is an example of a possible division:
There is always a total of 30 pens.
Another way to help students develop an understanding of the concept of a mean is to ask them to identify a missing piece of data for a set of data to have a particular mean.
Example 3
Five students are raising money. If the students raise an average of $25 each, they win tickets to a hockey game. On Monday, 4 of the 5 students meet and find that they have raised $29, $21, $31 and $13 respectively. What is the minimum amount of money that Suzie, the 5th student, must have raised if the group is to win the hockey tickets?
Students who have only learned to use the usual algorithm for determining the mean are often unable to answer this type of question because they are not able to adapt a learned procedure to the circumstances because they do not understand it. Students who have developed an understanding of the concept of mean as a share are better equipped to solve this type of problem. Here is an exchange that the four students might have to determine the amount of money Suzie would have to collect.
Student 1: We need to get an average of $25, which means that if we split the money equally between us, we will each have $25. I raised $29, so I can share $4 with you.
Student 2: I only got $21, so I am $4 short.
Student 3: I have an extra $6 because I collected $31.
Student 4: I'm sorry, I was sick this weekend and only raised $13. I'm $12 short.
Student 1: Let's use the idea of sharing to help us determine how much money Suzie will need to have collected to achieve an average of $25.
image Under the title "Before Sharing", five boxes are lined up side by side. The first box contains 25 chips. Above it, a set of 4 chips is linked to the second box by an arrow. The second box contains 21 chips. The third box contains 25 chips, and above it, a set of 6 chips points to the fourth box with an arrow. The fourth box contains 13 chips. The fifth box contains no chips, but has a question mark.As a result of the sharing, the first three students each have $25, but the fourth student is $6 short of $25, so Suzie must bring in the missing $6 in addition to her $25, so she must have collected $31.
Balance Between the Sum of the Surpluses and the Sum of the Shortages
Teachers can also help students develop an understanding of the concept of mean by presenting them with situations that involve the surplus/shortage balance model. This model, a variation of the sharing model, is perhaps less familiar. It is based on the idea that if, for example, a group of students has an average number of tokens, some students might have fewer than the average while others might have more. However, the total of what the students have less of must equal the total of what the students have more of. The two examples below illustrate this idea.
Example 1
Five people have 4 tokens each. Annie gives 1 token to Carl and 1 token to Daniel. Bahéya gives 1 token to Carl and 2 tokens to Eva.
image Five sets of 4 blue chips lined up side by side have the following names respectively: Annie, Bahéya, Carl, Daniel, Eva. Above the first set, arrows point to the third and fourth sets respectively. On the second set, an arrow points to the third set; two chips are surrounded and linked to the fifth set by an arrow.The mean is still 4 tokens per person. However, compared to the mean, Annie has a shortage of 2 tokens, Bahéya has a shortage of 3 tokens, Carl has a surplus of 2 tokens, Daniel has a surplus of 1 token, and Eva has a surplus of 2 tokens. Therefore, compared to the mean, the sum of the shortages is 5 tokens (2 + 3) and the sum of the surpluses is 5 tokens (2 + 1 + 2). We can see that the sum of the shortages equals the sum of the surpluses.
image Five sets of blue chips lined up side by side have the following names respectively: Annie, Bahéya, Carl, Daniel, Eva. The first set contains two chips and indicates a lack of two. The second set contains one chip and indicates a lack of three. The third set contains 6 chips and indicates a surplus of two. The fourth set contains five chips and indicates a surplus of one. And the fifth set contains 6 chips and indicates a surplus of two. Below the sets appears the following equation: sum of the lacks opening parenthesis two plus three closing parenthesis equals sum of the surpluses opening parenthesis two plus one plus two closing parenthesis.The surplus-shortage equilibrium model can be used to solve a variety of problems involving the mean. The situation below, presented in Example 3 of the Equal Share section above and solved using the sharing model, can just as easily be solved using the surplus-shortage equilibrium model.
Example 2
Five students are raising money. If the students raise a mean of $25 each, they win tickets to a hockey game. On Monday, 4 of the 5 students meet and find that they have raised $29, $21, $31 and $13 respectively. What is the minimum amount of money that Suzie, the 5th student, must have raised if the group is to win the hockey tickets?
First, look at each amount of money in relation to the mean of $25, and then determine the surplus or shortage.
29: $4 surplus
21: $4 shortage
31: $6 surplus
13: $12 shortage
Next, determine the sum of the surpluses and the sum of the shortages.
These two amounts are not equal, so to balance them out, we need an extra $6 ($16 - $10)
Therefore, Suzie must have collected $25 + $6, or $31.
Once students have developed a good understanding of the concept of mean, they are able to:
- determine the mean of a data set and understand the relationship between the data and the mean;
- create a set of data that corresponds to a particular mean and understand that the same mean can come from more than one set of data;
- determine a missing data item from a set of data to obtain a particular mean and understand the effect on the mean of adding new data.
Only when this understanding is achieved should students be introduced to the usual algorithm for calculating the mean, which is:
image The stacked bar graph is titled "Pets of Students in Grades one to 6". The horizontal axis corresponds to pets while the vertical axis, graduated from zero to 95, is for the number of students. The yellow portion of the bar corresponds to students in grades one to three, while the blue portion corresponds to students in grades four to six. No pet: blue bar at 35, yellow bar at 75. Other animal: blue bar at eleven, yellow bar at 20. Fish: blue bar between 5 and 10, yellow bar at 20. Bird: blue bar between zero and five, yellow bar between 5 and ten. Cat: blue bar at 30, yellow bar between 65 and 70. Dog: blue bar at 45, yellow bar between 80 and 85. image The stacked bar graph is titled " Favourite Vegetables of Students in Grades 4 and 5". The horizontal axis corresponds to the vegetables while the vertical axis, graduated from zero to 16, is for the number of students. The blue portion of the bar corresponds to students in grade 4, while the yellow portion corresponds to students in grade 5. Carrot: blue bar at 9, yellow bar at 14. Bean: blue bar at three, yellow bar at five. Broccoli: blue bar at four, yellow bar at ten. Asparagus: blue bar at five, yellow bar at eight. Celery: blue bar at two, yellow bar at six. Cucumber: blue bar at five, yellow bar at twelve. Bell pepper: blue bar at one, yellow bar at three. Turnip: no blue bar, yellow bar at two.Note that focusing on the conceptual understanding of the mean avoids some of the following conceptual errors that were identified by Konold and Higgins (2003, pp. 203-204):
- some students confuse the mean with the mode, in other words, they associate it with the most frequent value;
- some students associate the mean only with an algorithm, which makes it difficult for them to create a data set that corresponds to a particular mean;
- some students mistake the mean for the median, that is, they associate it with the value in the center of the data set.
Even with a good understanding of statistical measures, it is not always easy to choose the best measure for a given situation in a decision-making context, so in the junior grades it is best to stick to simple situations.
Example 3
Matthew wants to negotiate with his parents for an increase in the amount of weekly allowance they give him. Knowing that he cannot simply ask for an increase without good reason, he decides to survey his fellow students to find out how much allowance they receive each week. He organizes the data collected on a line plot.
image The number line shows weekly amounts of pocket money in dollars, with "X's". At zero, there are four X's. At one, there is one "X". At one and a half, there are two X's. At two, there are two X's. At two and a half, there is an "X". At three, there is an "X". At three and a half, there are four X's. At four, there are five X's. At four and a half, there are four "X's". At five, there are five X's. At five and a half, there are two "X's". At six, there are four "X's". At six and a half, there are no "X's". At seven, there is one "X". At seven and a half, there is no "X". At eight, there is an "X". At eight and a half, nine and nine and a half, there is no "X". And at ten, there are six X's.Matthew then analyzes this data to choose a value on which to base his arguments for an increase. He finds that $10 is the most common value. He goes to his father and explains that his survey shows that more students receive $10 in spending money than any other amount.
His father finds this amount rather high and asks to see the set of data. After reviewing it, he explains to Matthew that because of the distribution of the data, mode is not the best measure to represent this data. Then he determines that the mean of the amounts allocated is $4.66 and tells Matthew that this amount seems more appropriate.
A bit disappointed, Matthew calculates the median and finds that it is $4.50. He understands that in this situation, both the mean and the median are good measures to represent the data, but since the median is lower than the mean, he decides that it is not to his advantage to use it.
Source: translated from Guide d’enseignement efficace des mathématiques, de la 4e à la 6e année, Traitement des données et probabilité, p. 115-125.
Knowledge: Median
The median of a set of data is the value in the middle of the set of data. This means that there are the same number of values on either side of the median. To determine the median of an odd number of data values, simply put the data in ascending or descending order and identify the value in the middle. In the case of an even number of values, the median is the value that is halfway between the two values in the middle. In such cases, the median could be a value that is not part of the set of data.
Example 1
The data below was recorded during a long jump competition.
1.04 m 1.06 m 1.12 m 1.13 m 1.16 m 1.19 m 1.22 m 1.28 m 1.36 m
There are 9 values, and they are put in ascending order. The median of these values is the fifth one, which is 1.16 m. Notice that there is the same number of values (4) on each side of the median.
Example 2
The stem-and-leaf plot below shows 22 values in ascending order. There are two values in the middle, the 11th and 12th. There is the same number of data points (10) on either side of these two values. Since these two values correspond to 72 heartbeats per minute, then this is the value assigned to the median.
Example 3
During a fundraiser for their sports team, 10 students sold boxes of chocolate. Here is the number of boxes sold:
15 12 11 10 10 8 7 6 5 5
The 5th and 6th values are in the middle of this set of 10 data items put in descending order. These two values, 10 and 8, are different. The median is then the number 9, since 9 is halfway between 8 and 10. Despite the fact that this median is not part of the set of data, we can see that there are five values on either side of 9.
Example 4
Here is the data for the maximum daily temperature, in degrees Celsius, in a city in June.
Maximum Temperatures in June (°C) | ||||||
---|---|---|---|---|---|---|
21 | 16 | 17 | 16 | 14 | 12 | 20 |
19 | 20 | 18 | 21 | 21 | 25 | 26 |
26 | 28 | 27 | 27 | 23 | 21 | 25 |
24 | 29 | 29 | 32 | 33 | 30 | 29 |
33 | 28 |
Students must first put the values in ascending order by writing them in an intermediate stem-and-leaf plot as follows.
1 | 6 | 7 | 6 | 4 | 2 | 9 | 8 | ||||||||||||
2 | 1 | 0 | 0 | 1 | 1 | 5 | 6 | 6 | 8 | 7 | 7 | 3 | 1 | 5 | 4 | 9 | 9 | 9 | 8 |
3 | 2 | 3 | 0 | 3 |
They can then place the leaves in each row in ascending order and obtain the following plot.
1 | 2 | 4 | 6 | 6 | 7 | 8 | 9 | ||||||||||||
2 | 0 | 0 | 1 | 1 | 1 | 1 | 3 | 4 | 5 | 5 | 6 | 6 | 7 | 7 | 8 | 8 | 9 | 9 | 9 |
3 | 0 | 2 | 3 | 3 |
To help students determine the median, teachers may suggest that they write the values in ascending order on a strip of paper and fold it as follows.
Using the paper strip helps students develop a better understanding of the concept of median. By folding the strip in half, the values are paired (first to last, second to second to last, and so on). Students discover that there are two values in the middle, 24 and 25. They also find that there are no whole numbers between 24 and 25. They then use their knowledge of decimal numbers to determine that it is the value 24.5 that is in the middle between 24 and 25. The median of this data set is therefore 24.5°C.
Example 5
Let's go back to Example 3 and add a variant as follows:
During a fundraiser for their sports team, 10 students sold boxes of chocolate. Here is the number of boxes sold.
15 12 11 10 10 8 7 6 5 5
However, three other students have not yet reported the number of boxes they sold. If the goal was to obtain a median number of 8 boxes sold, what data set corresponding to the sales of the 13 students could meet this goal?
Here are two examples of possible answers.
15 12 11 10 10 8 8 7 7 6 5 5 4
15 12 11 10 10 9 8 7 6 5 5 4 3
If we add, as a condition, that the mode of the data set also corresponds to 8 boxes, students could give the following answer.
15 12 11 10 10 8 8 8 7 6 5 5 4
Source: translated from Guide d’enseignement efficace des mathématiques, de la 4e à la 6e année, Traitement des données et probabilité, p. 111-115.