PEARSON’S COEFFICIENT OF SKEWNESS
Statistics and ProbabilityPEARSON’S COEFFICIENT OF SKEWNESS
3 (mean − median )
=standard deviation
As you can see, this coefficient involves the calculation of the mean as well as the standard deviation. Actually, the numerator is divided by the standard deviation in order to obtain a pure number. If the analysis of a dataset is being undertaken using the median and quartiles alone, then we use a measure called Bowley’s coefficient of skewness.
The advantage of this particular formula is that it requires NO KNOWLEDGE of the MEAN or STANDARD DEVIATION. In an asymmetrical distribution, the quartiles will NOT be equidistant from the median, and the AMOUNT by which each one deviates will give an indication of skewness. Where the distribution is positively skewed, Q1 will be closer to the median than Q3.In other words, the distance between Q3 and the median will be greater than the distance between the median and Q1.
POSITIVE SKEWNESS
And hence, if we subtract the distance median Q1 from the distance Q3 median, we will obtain a positive answer. In case of a positively skewed distribution: (Q3 median) (Median Q1) > 0
i.e. Q1 + Q32 median >0 The opposite is true for skewness to the left
NEGATIVE SKEWNESS
In this case: (Q3 median) (Median Q1) < 0 i.e Q1 +Q3 2 median <0 The gist of the above discussion is that in case of a positively skewed distribution, the quantity
~
Q1 + Q3
2X
will be positive, whereas in case of a negatively distribution, this quantity will be negative. A RELATIVE measure of skewness is obtained by dividing
Q1 + Q3 – ~
2X
by the interquartile range i.e. Q3 Q1, so that Bowley’s coefficient of skewness is given by: Bowley’s coefficient of Skewness
~
(Q + Q − 2X)
13
=
Q − Q
31
It is a pure (unit less) number, and its value lies between 0 and ± 1. For a positively skewed distribution, this coefficient will turn out to be positive, and for a negatively skewed distribution this coefficient will come out to be negative. Let us apply this concept to the example regarding the ages of children of the manual and nonmanual workers that we considered in the last lecture.
EXAMPLE: Sample statistics pertaining to ages of children of manual and nonmanual workers:
Children of Manual Workers | Children of NonManual Workers | |
Mean | 8.50 years | 8.50 years |
Standard deviation | 3.61 years | 3.61 years |
Median | 8.50 years | 9.16 years |
Q1 | 6.00 years | 5.50 years |
Q3 | 11.00 years | 10.83 years |
Quartile deviation | 2.50 years | 2.66 years |
The statistics pertaining to children of manual workers yield the following PICTURE:
Ages of Children of Manual Workers f
Q1= Q3=
6.0 11.0
On the other hand, the statistics pertaining to children of nonmanual workers yield the following PICTURE:
Ages of Children of NonManual Workers
5.5
~~
and X Xand Q
^{Q}1 3
The diagram pertaining to children of nonmanual workers clearly shows that the distance between is much greater than the distance between which happens whenever we are dealing with a negatively skewed distribution. If we compute the Bowley’s coefficient of skewness for each of these two datasets, we obtain:
Bowley’s Coefficient of Skewness
Ages of Children Ages of Children of Manual Workers of NonManual Workers
11 .00 + 6 .00 − 2 × 8 .50 10 .83 + 5 .50 − 2 × 9 .16
=
2.50 10 .83 − 5 .50
=0 = –0.37
As you have noticed, for the children of the manual workers, the Bowley’s coefficient has come out to be zero, whereas for the children of the nonmanual workers, the coefficient has come out to be negative. This indicates that the distribution of the ages of the children of manual workers is symmetrical whereas the distribution of the ages of the children of the nonmanual workers IS negatively skewed EXACTLY the same conclusion that we obtained when we computed the Pearson’s coefficient of skewness.
KURTOSIS
The term kurtosis was introduced by Karl Pearson. This word literally means ‘the amount of hump’, and is used to represent the degree of PEAKEDNESS or flatness of a unimodal frequency curve. When the values of a variable are closely BUNCHED round the mode in such a way that the peak of the curve becomes
On the other hand, if the curve is flattopped, we say that the curve is PLATYKURTIC:
The NORMAL curve is a curve which is neither very peaked nor very flat, and hence it is taken as A BASIS FOR COMPARISON. The normal curve itself is called MESOKURTIC. I will discuss with you the normal in detail when we discuss continuous probability distributions. At the moment, just think of the symmetric hump shaped curve shown below:
The tallest one is called leptokurtic, the intermediate one is called mesokurtic, and the flat one is called platykurtic. The question arises, “How will we MEASURE the degree of peakedness or kurtosis of a dataset?” A MEASURE of kurtosis based on quartiles and percentiles is
Q.D.
K = ,
P − P
90 10
This is known as the PERCENTILE COEFFICIENT OF KURTOSIS. It has been shown that K for a normal distribution is 0.263 and that it lies between 0 and 0.50. In case of a leptokurtic distribution, the percentile coefficient of kurtosis comes out to be LESS THAN 0.263, and in the case of a platykurtic distribution, the percentile coefficient of kurtosis comes out to be GREATER THAN 0.263.The next concept that I am going to discuss with you is the concept of moments a MATHEMATICAL concept, and a very important concept in statistics.
MOMENTS
A moment designates the power to which deviations are raised before averaging them. For example, the quantity
1 ∑(x − x)^{1 }= ^{1 }∑(x − x)
i
i
nn
is called the first sample moment about the mean, and is denoted by m1. Similarly, the quantity
1
∑(x_{i }− x)^{2 }
n
is called the second sample moment about the mean, and is denoted by m2.In general, the rth moment about the mean is: the arithmetic mean of the rth power of the deviations of the observations from the mean. In symbols, this means that
1
r for sample data.
m = ∑(x − x)
ri
n
Moments about the mean are also called the central moments or the mean moments. In a similar way, moments about an arbitrary origin, say α, are defined by the relation
^{1}r
m′= ∑(x −α)
ri
, for sample data
n
For r =1, we have
1
i
m = ∑(x − x)= ^{∑ x }− x = x − x = 0,
1 i
nn
and
1 ∑ x
i
_{1 }∑(x_{i }−α)= −α = x −α.
m’= nn
Putting r = 2 in the relation for mean moments, we see that
1
2
m = ∑(x − x)
2 i
n
which is exactly the same as the sample variance. If we take the positive square root of this quantity, we obtain the standard deviation. In the formula,
m′= ^{1 }∑(x − α)^{r }
ri
if we put α = 0, we obtain n
m′ _{r }= ^{1 }_{∑ }x_{i }^{r }
n
and this is called the rth moment about zero, or the rth moment about the origin. Let us now consolidate the idea of moments by considering an example.
EXAMPLE
Calculate the first four moments about the mean for the following set of examination marks: 45, 32, 37, 46, 39, 36, 41, 48 &36. For convenience, the observed values are written in an increasing sequence. The necessary calculations appear in the table below:
xi | xi –⎯x | (xi –⎯x)2 | (xi –⎯x)3 | (xi –⎯x)4 |
32 36 36 37 39 41 45 46 48 | – 8 – 4 – 4 – 3 – 1 1 5 6 8 | 64 16 16 9 1 1 25 36 64 | – 512 – 64 – 64 – 27 – 1 1 125 216 512 | 4096 256 256 81 1 1 625 1296 4096 |
360 | 0 | 232 | 186 | 10708 |
x 360
Now ∑ i marks.
x = == 40
n 9
Therefore ∑(x_{i }− x)
m == 0
1
n 2
(x − x) 232
m_{2 }=^{∑ }^{i }== 25.78 (marks )^{2 }
n9 3
∑(x_{i }− x) 186 _{)}3
m_{3 }= == 20.67 (marks
n9 4
∑(x_{i }− x) 10708
m = == 1189.78 (marks )^{4 }
4
n9
All the formulae that I have discussed until now pertain to the case of raw data. How will we compute the various moments in the case of grouped data?
MOMENTS IN THE CASE OF GROUPED DATA
When the sample data are grouped into a frequency distribution having k classes with midpoints x1, x2, …, xk and the corresponding frequencies f1, f2, …,fk , (∑fi = n), the rth sample moments are given by
1
r
m_{r }=∑ f_{i }(x − x) , and
i
n
m’_{r }= ^{1 }∑ f_{i }(x_{i }− α)^{r}.
n
In the calculation of moments from a grouped frequency distribution, an error is introduced by the assumption that the frequencies associated with a class are located at the MIDPOINT of the class interval. You remember the concept of grouping error that I discussed with you in an earlier lecture? Our moments therefore need corrections. These corrections were introduced by W.F. Sheppard, and hence they are known as SHEPPARD’S CORRECTIONS: Sheppard’s Corrections for Grouping Error: It has been shown by W.F. Sheppard that, if the frequency distribution (i) is continuous and (ii) tails off to zero at each end, the corrected moments are as given below:
2
h
m2 (corrected) = m2 (uncorrected) – ;
12 m3 (corrected) = m3 (uncorrected); 2 m4 (corrected) = m4 (uncorrected) – ^{h}. m2 (uncorrected) + ^{7 }. h^{4};
2 240
where h denotes the uniform classinterval. The important point to note here is that these corrections are NOT applicable to highly skewed distributions and distributions having unequal classintervals. I am now going to discuss with you certain mathematical RELATIONSHIPS that exist between the moments about the mean and the moments about an arbitrary origin. The reason for doing so is that, in many situations, it is easier to calculate the moments in the first instance, about an arbitrary origin. They are then transformed to the meanmoments using the relationships that I am now going to convey to you. The equations are:
m1= 0
2
m2 = m’_{2 }−(m’_{1}) ;
3
m_{3 }−3m’ 2 ) , and _{114 }
^{m’}3 2 ^{m’}1^{+(m’}1
2 _{)}4
m_{4 }= m’ − 4 m’_{3 }m’_{1 }+ 6m’_{2 }(m’ )− 3(m’_{1}
41 In this course, I will not be discussing the mathematical derivation of these relationships. You are welcome to study the mathematics behind these formulae if you are interested. (The derivation is available in your own text book.)But I would like to give you two tips for remembering these formulae:
• In each of these relations, the sum of the coefficients of various terms on the right hand side equals zero and
• Each term on the right is of the same dimension as the term on the left. Let us now apply these concepts to an example:
EXAMPLE
Compute the first four moments for the following distribution of marks after applying Sheppard’s corrections:
Marks out of 20
5 6 7 8 9101112131415
No. of Students
1 2 51020512211 5 3 1
If we wish to compute the first four moments about the mean by the direct method, first of all, we will have to compute mean itself. The mean of this particular dataset comes out to be 10.06. But, 10.06 is not a very convenient number to work with! This is so because when we construct the columns of
etc.,
X − X, (X − X)^{2 }
we will have a lot many decimals. An alternative way of computing the moments is to take a convenient number as the arbitrary origin and to compute the moments about this number. Later, we utilize the relationships between the moments about the mean and the moments about the arbitrary origin in order to find the moments about the mean. In this example, we may select 10 as the arbitrary origin, which is the Xvalue corresponding to the highest frequency 51, and construct the column of D which is the same as X10. Next, we compute the columns of fD, fD2, fD3, and so on.
Earnings in Rs.(xi) | No. of Men fi | Di (xi – 10) | fiDi | fiDi 2 | fiDi 3 | fiDi 4 |
5 | 1 | – 5 | – 5 | 25 | – 125 | 625 |
6 | 2 | – 4 | – 8 | 32 | – 128 | 512 |
7 | 5 | – 3 | – 15 | 45 | – 135 | 405 |
8 | 10 | – 2 | – 20 | 40 | – 80 | 160 |
9 | 20 | – 1 | – 20 | 20 | – 20 | 20 |
10 | 51 | 0 | 0 | 0 | 0 | 0 |
11 | 22 | 1 | 22 | 22 | 22 | 22 |
12 | 11 | 2 | 22 | 44 | 88 | 176 |
13 | 5 | 3 | 15 | 45 | 135 | 405 |
14 | 3 | 4 | 12 | 48 | 192 | 768 |
15 | 1 | 5 | 5 | 25 | 125 | 625 |
Sum | 131 | .. | 8 | 346 | 74 | 3718 |
Sum ÷ n | 1 | .. | 0.06 =m′ 1 | 2.64 =m′ 2 | 0.56 =m′ 3 | 28.38 =m′ 4 |
Moments about the mean are:
m1 =0 m2 = m´2 – (m ´1)2 = 2.64 – (0.06)2 = 2.64 m3 = m ´3 – 3m´2m´1 +2 (m´1)3
= 0.56 – 3(2.64) (0.06) + 2(0.06)3 = 0.08 m4 = m ´4 – 4m´3m´1 +6m´2 (m´1)2
– 3(m ´1)4 = 28.38 – 4.(0.56) (0.06)
+ 6(2.64) (0.06)2 – 3(0.06)4 = 28.30 Applying Sheppard’s corrections, we have
m2 (corrected) = m2 (uncorrected) – = 2.64 – 0.08 = 2.56,
m3 (corrected) = m3 (uncorrected) = 0.08,
m4 (corrected) = m4 (uncorrected)
– . m2 (uncorrected) +
= 28.30 – 1.32 + 0.03 = 27.01 I have discussed with you in quite a lot of detail the concept of moments. The question arises, “Why is it that we are going through all these lengthy calculations? What is the significance of computing moments? “You will obtain the answer to this question when I discuss with you the concept of moment ratios. There are certain ratios in which both the numerators and the denominators are moments. The most common of these momentratios are denoted by b1 and b2, and defined by the relations:
MOMENT RATIOS:
2. (m_{3 }) m_{4}
b_{1 }= and b_{2 }=
3.(m_{2 })(m_{2 })^{2 }
(in the case of sample data) They are independent of origin and units of measurement, i.e. they are pure numbers. b1 is used to measure the skewness of our distribution, and b2 is used to measure the kurtosis of the distribution.
INTERPRETATION OF b1
For symmetrical distributions, b1 is equal to zero. Hence, for any dataset, b1 comes out to be zero, we can conclude that our distribution is symmetric. It should be noted that the measure which will indicate the direction of skewness is the third moment round the mean.
If our distribution is positively skewed, m3 will be positive, and if our distribution is negatively skewed, m3 will be negative.b1 will turn out to be positive in both situations because it is given by
(m_{3 })^{2 }
b = ^{1 }(m_{2 })^{3 }(Since m3 is being squared, b1 will be positive regardless of the sign of m3.)
INTERPRETATION OF b2
For the normal distribution, b2 = 3. For a leptokurtic distribution, b2 > 3, and for a platykurtic distribution, b2 < 3 You have noted that the third and fourth moments about the mean provide information about the skewness and the kurtosis of our dataset. This is so because m3 occurs in the numerator of b1 and m4 occurs in the numerator of b2. What about the dispersion and the centre of our dataset? Do you not remember that the second moment about the mean is exactly the same thing as the variance, the positive square root of which is the standard deviation the most important measure of dispersion? What about the centre of the distribution? You will be interested to note that the first moment about zero is NONE OTHER than the arithmetic mean! This is so because 1 1 1
∑(x_{i }− 0)∑ x_{i }
nn
is equal to
none other than the arithmetic mean! In this way, the first four moments play a KEY role in describing frequency distributions.
Recent Comments