STATISTICAL REPRESENTATION MEASURES OF CENTRAL TENDENCY PART 1
Business Mathematics & StatisticsSTATISTICAL REPRESENTATION MEASURES OF CENTRAL TENDENCY PART 1
OBJECTIVES
The objectives of the lecture are to learn about:
- Review Lecture 18
- Statistical Representation
- Measures of Central Tendency
LINE GRAPHS
Line graphs are the most commonly used graphs. In the following graph, you can see the occurrence of causes of death due to cancer in males and females. You can see that after the age of 40, the occurrence of cancer is much greater in the case of males. The line graph of heart diseases also shows that the disease is more prominent in the case of males. As you see line graphs help us to understand the trends in data very clearly.
Another line graph of temperature in 4 cities A, B, C and D shows that although the general pattern is similar, the temperature in city A is lowest followed by D, B and C. In city C the highest temperature is close to 30 whereas in city A and B it is about 25. The highest temperature in city D is about 28 degrees.
Central Tendency
The term central tendency refers to the middle value (sometime a typical value) of the data. Measures of central tendency are measures of the location of the middle or the center of a distribution. The “Mean” is the most commonly used measure of central tendency.
MEAN
Also known as the arithmetic mean, the mean is typically what is meant by the word average. The mean is perhaps the most common measure of central tendency. The mean of a variable is given by (the sum of all its values)/(the number of values). For example, the mean of 4, 8, and 9 is
(4 + 8 + 9)/3 = 7
Example:
58 69 73 67 76 88 91 and 74 (8 marks).
Sum = 596 Mean = 596/8 = 74.5 Please note that the mean is affected by extreme values.
MEDIAN
Another typical value is the median. To find the median of a number of values first arrange the data in ascending or descending order then locate the middle value, , If there are odd number of data points then median is the middle values. If there are even number of data points then median is mean of the two middle values.
Median is easier to find than the mean, and unlike the mean it is not affected by values that are unusually high or low
Example:
3 6 11 14 19 19 21 24 31 (9 values) In the above data there are 9 values. So, median is The middle value i-e 19.
MODE
The most common score in a set of scores is called the mode. There may be more than one mode, or no mode at all 2 2 1 2 0 3 2 1 1 4 1 1 1 2 2 0 3 2 1 The mode, or most common value, is 1.
Mean Median Mode | Description/Explanation
the sum of all the results included in the sample divided by the number of observations the middle value of all the numbers in the sample. the most frequently observed value of the measurements in the sample. There can be more than one mode or no mode. • for an even number of values, the median is the average of the middle two values • for an odd number of values, the median is the middle of the all of the values. |
Advantages
Quick and easy to calculate Takes all numbers into account equally Fairly easy to calculate Half of the sample (normally) lies above the median |
Disadvantages
May not be representative of the whole sample More tedious to calculate than the other two Can be affected by a few very large (or very small) numbers Tedious to find for a large sample which is not in order |
ORGANISING DATA
There are many different ways of organizing data.
Numerical data can be organized in any of the following forms:
- The Ordered Array and Stem-leaf Display
- Tabulating and Graphing Numerical Data
- Frequency Distributions: Tables, Histograms, Polygons
- Cumulative Distributions: Tables, the Ogive
Stem and Leaf Display
A stem and leaf display (also called a stem and leaf plot) is particularly useful when the data is not too numerous.
Since 21 = 20 + 1 = (10 × 2) + 1 This is represented in the plot as a stem of 2 and a leaf of 1. The digit at the tenth place is taken as stem and the digit at units place is taken as leaf. Similarly, 26 is represented in the plot as a stem of 2 and a leaf of 6. Remember, a stem is displayed once and the leaf can take on the values from 0 to 9.
Example
Consider Figure 1. It shows the number of touchdown (TD) passes thrown by each of the 31 teams in the National Football League in the 2000 season.
Figure 1: Number of touchdown passes.
A stem and leaf display of the data is shown in the Table 1 below. The left portion of the table contains the stems. They are the numbers 3, 2, 1, and 0, arranged as a column to the left of the bars. (As in 34, 3 is stem and 4 is leaf. In 16, 1 is stem and 6 is leaf)
Stem and leaf display showing the number of passing touchdowns.
3|2337 2|001112223889 1|2244456888899 0|69 |
Table 1
To make this clear, let us examine this Table 1 more closely. In the top row, the four leaves to the right of stem 3 are 2, 3, 3, and 7. Combined with the stem, these leaves represent the numbers 32, 33, 33, and 37, which are the numbers of TD passes for the first four teams in the table. The next row has a stem of 2 and 12 leaves. Together, they represent 12 data points. We leave it to you to figure out what the third row represents. The fourth row has a stem of 0 and two leaves. One purpose of a stem and leaf display is to clarify the shape of the distribution. You can see many facts about TD passes more easily in Figure 1 than in the Table 1. For example, by looking at the stems and the shape of the plot, you can tell that most of the teams had between 10 and 29 passing TDs, with a few having more and a few having less. The precise numbers of TD passes can be determined by examining the leaves.
Tabulating and Graphing Univariate Categorical Data
There are different ways of organizing univariate categorical data:
- The Summary Table
- Bar and Pie Charts, the Pareto Diagram
Tabulating and Graphing Bivariate Categorical Data
Bivariate categorical data can be organized as :
- Contingency Tables
- Side by Side Bar charts
GRAPHICAL EXCELLENCE AND COMMON ERRORS IN PRESENTING DATA
It is important that data is organized in a professional manner and graphical excellence is achieved in its presentation. High quality and attractive graphs can be used to explain and highlight facts which otherwise may go unnoticed in descriptive presentations. That is why all companies in their annual reports use different types of graphs to present data.
TABULATING NUMERICAL DATA:
Group data into classes
In some cases it is necessary to group the values of the data to summarize the data properly.
The process is described below.
Step 1: Sort Raw Data in Ascending Order
Data: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Step 2: Find Range
Range = Maximum value – Minimum Value Thus, Range = 58 – 12 = 46
Step 3: Select Number of Classes
Select the number of classes. (The classes are usually selected between 5 and 15) In our example let us make 5 classes.
Step 4: Compute Class (width)
Find the class width by dividing the range by the number of classes and rounding up. Be careful of two things.
a) You must round up, not off. Normally 3.2 would round to be 3, but in rounding up, it becomes 4.
b) If the range divided by the number of classes gives an integer value (no
remainder), then you can either add one to the number of classes or add one to
the class width. In our example Class width = Range = 46 = 9.2
Number of classes 5 Round up 9.2 to 10
Step 5: Determine Class Boundaries (limits)
Pick a suitable starting point less than or equal to the minimum value. You will be able to cover, “the class width times the number of classes”, values. Your starting point is the lower limit of the first class. Continue to add the class width to this lower limit to get the
lower limit of other classes.
In this example if we start with 10 we will cover 10 × 5 = 50 values, which is close to our
range. So let 10 be the lower limit of the first class. Continue to add 10 to this lower limit to get the lower limit of other classes. : 10, 20(=10+10), 30(=20+10), 40(=30+10), 50(=40+10) To find the upper limit of the first class, subtract one from the lower limit of the second class. Then continue to add the class width to this upper limit to find the rest of the upper limits Upper limit of first class is 20 – 1 = 19. Rest upper limits are; 29 (=19+10), 39 (=29+10), 49 (=39+10)
Step 6: Compute Class Midpoints
Class Midpoint = (Lower limit + Upper limit) 2
First midpoint is (10+19)/2 = 14.5
Other midpoints are: (20+29)/2 = 24.5 (30+39)/2 = 34.5 (40+49)/2) = 44.5 (50+59)/2) = 54.5
Depending on what you’re trying to accomplish, it may not be necessary to find the midpoint.
Step 7: Compute Class Intervals
First class : Lower limit is 10. Higher limit is 19. We can write first class interval as 10 to 19 or 10 – 19 or “10 but under 20”. In “10 but under 20” a value greater than 19.5 will be treated as above 20.
Similarly other 4 class intervals are 20 – 29 30 – 39 40 – 49 50 – 59
Important points to remember
1. There should be between 5 and 15 classes.
2. Choose an odd number as a class width if you want to have classes’ midpoints as an integer instead of decimals.
3. The classes must be mutually exclusive. This means that no data value can fall into two different classes
4. The classes must be all inclusive or exhaustive. This means that all data values must be included.
5. The classes must be continuous. There should be no gaps in a frequency distribution. Classes that have no values in them must be included (unless it’s the first or last class (es) that could be dropped).
6. The classes must be equal in width. The exception here is the first or last class. It is possible to have a “below …” as a first class or “… and above” as a last class.
Frequency Distribution: Count Observations & Assign to Class Intervals:
Looking through the data shows that there are three values between 10 and 19. Hence frequency is 3. Similarly, frequency of other class intervals can be found as follows: 10 – 19 : 3 20 – 29 : 6 30 – 39 : 5 40 – 49 : 4 50 – 59 : 2 Total frequency = 3 + 6 + 5 + 4 + 2 = 20
Relative frequency :
Relative Frequency of a class = Frequency of the class interval Total Frequency
There are 3 observations in first class interval 10 – 19. The relative frequency is 3/20 = 0.15. Similarly relative frequency for other class intervals are calculated.
Percent Relative Frequency:
If we multiply 0.15 by 100, then the % Relative Frequency 15% is obtained.
Cumulative Frequency:
If we add frequency of the second interval to the frequency of the first interval , then the cumulative frequency for the second interval is obtained. Cumulative frequency of each class interval is calculated below. 10 – 19 : 3 20 – 29 : 3 + 6 = 9 30 – 39 : 3 + 6 + 5 = 14 40 – 49 : 3 + 6 + 5 + 4 = 18 50 – 59 : 3 + 6 + 5 + 4 + 2 = 20
Percent cumulative relative frequency
This can be calculated same as cumulative frequency except now percent relative frequency for each class interval is considered. The percent cumulative relative frequency of the last class interval is 100% as all observations have been added. Percent cumulative relative frequency of each class interval is calculated below 10 – 19 : 15 20 – 29 : 15 + 30 = 45 30 – 40 : 15 + 30 + 25 =70 40 – 50 : 15 + 30 + 25 + 20 = 90 50 – 60 : 15 + 30 + 25 + 20 + 10 = 100
Recent Comments