Quantitative Lecture-15

Statistics

Very Very Very Important Rules:

1. Addition or subtraction by a same value to each element of any set would have no effect on standard deviation of a set.

For instance,

A = {2, 4, 6, 8}

As the standard deviation of this set is 2. If we add 1 to each element of this set, then new set becomes

B = {3, 5, 7, 9}

The standard deviation of this new set would remain 2.

2. Multiplication and division by same value to each element of any set would have same effect of standard deviation.

For instance,

A = {2, 4, 6, 8}

This set has standard deviation 2

If we divide each element of this set by 2, we’ll get a new set

B = {1, 2, 3, 4}

Now the standard deviation of this new set would be 1; so standard deviation will also be divided be 2 to form new standard deviation for the new set.

Let’s illustrate an example to learn how the above rule is used.

Five different containers have different amount of water and the standard deviation of water level of all these containers is 10. If due to sunlight, 20% of the amount of water evaporated. What would be the standard deviation of water level after some water evaporated?

Solution:

As we have learned in percentage topic that 20% water evaporated means 80% of water remaining. i.e here in this set (of water containers) each containers would left 0.8 times of water (i.e 0.8 times of element), therefore here 2nd rule would be applied as follows:

Old standard deviation × 0.8 = New standard deviation

⇒ New standard deviation = 10 (0.8) = 8 Answer.

Now, let’s learn a hard scenario with larger data available in tabular form, rather than in small set form.

Number of Students	Daily Pocket money
3	200
5	300
8	500
14	1000

If rather than in form of sets like {a, b, c, ….}, the data is given in tabular form, and Mean, Median or Standard Deviation is required to determine, how we can solve in very short duration?

Let’s learn how we can do so under such circumstances. The first step you need to do is to find two things: frequency and data.

In the table above first you need to decide whether Number of students are frequency or data. Among the two variables (i.e Number of students and Daily Pocket money), one is always frequency and the other one always data.

You can easily find frequency first by using commonsense. Tell me should we say ‘Daily Pocket money per Student’ or ‘Number of students per Daily Pocket money’?

Obviously, the former is correct, and latter one doesn’t make a sense. Now always remember frequency comes always in denominator. i.e variable that come after ‘per’ which is verbal translation of division. So, the former statement says ‘Daily Pocket money per Student’ which can be written as ^{(Daily Pocket money)}⁄_Student. So the variable named as ‘Number of students’ must be frequency.

This was the only hard part you need to understand, remaining is ‘halwa’ (i.e very easy :))

Just count the sum of (i.e Σ) frequency column which is,

Σ Number of students = Σ (ƒ) = 3 + 5 + 8 + 14 = 30

The above steps you always need to do in order to find either Mean or Median.

First, let’s find Mean of the data given in tabular form:

As mean is obtained by dividing sum by numbers. So first let’s find sum of the data in tabular form. As 200 will repeats 3 times, so we can either add 200 three times or just multiply 200 by 3. And same operation we’ll do for other rows to find sum of data.

Sum of data is:

3 × 200 = 600
5 × 300 = 1500
8 × 500 = 1500
14 × 1000 = 14000
Σ17600

Now, we got the sum, and need the count or numbers, which is the sum of frequency. Because I told you that frequency is always come in denominator. So by dividing the sum (i.e 17600) by frequency total (i.e 30), we’ll get Mean of the tabular data mentioned above:

Mean = Average = ¹⁷⁶⁰⁰⁄₃₀ = ¹⁷⁶⁰⁄₃ ≈ 587 Answer.

Now, let’s find Median of the data in tabular form above, that is rather easy. 🙂

Remember if the sum of frequency column is an even integer (as in above case it’s 30), always take half of the sum. After that do the following steps to find Median of this set:

As half of 30 is 15 so,

Median = Average of 15^th value and 16^th value in the table.

So in short, when the frequency sum is even, then divide it by 2. After than take average of that numbered data and the next value. This will be the median.

So, in above case, see the table in frequency column, and count from the top and add each frequency till you got 15th and then 16th two values. For instance, 3 + 5 = 8 and then + if you add 8 next, it’ll give 16. Hence the 15th value will also be the same as 16th value. If you check the data from this frequency where you’ve reached, you’ll find 500. So 15th value and 16th value of the table is 500. And average of these two value is you can easily say 500 because average of 500 and 500 is 500.

For those, who not able to understand whats happen here, let’s try to convert the table in set form below:

As 200 repeats 3 times (3 being frequency) and 300 repeats 5 times (5 being frequency) and so on…

{200, 200, 200, 300, 300, 300, 300, 300, ……….}

Now, here if the number of elements (which is also sum of frequency as 200 repeats 3 times so we’ll take 3 elements repeat and so on…) in a set are even, we take average of middle two elements. Because in this case middle element don’t exists, rather two elements at middle exists. So we take average at above in similar passion, because at middle two 500s were exists. We came to know two 500s at middle by dividing the total elements (i.e sum of frequency) by 2. If this will give us an integer, that shows the number of elements in the set is even. So we need to take average of middle two elements of this set. And the middle element (i.e 15th value) will always the elements whose number is on half of total number of elements (i.e 30) and the next element (i.e 16th value). Where if you see 15th value from the table, you’ll get 500. And similar value of 16th is also 500, because both values come on frequency number 8. Remember this concept of counting frequency is known as cumulative frequency, that we’ll discuss later in this topic.

Now if you count from the set, it’ll be hard in case the frequency is, let’s say, total 400 etc. So instead on converting the table in set, just use this short method to find the median.

So Median = ^{(15th value + 16th value)}⁄₂ = ^{(500 + 500)}⁄₂ = 500 Answer.

Percentile:

In contrast to percentage, percentile is only related to number of things rather than value. For instance,

Suppose, 5 people took an exam where Mr. A was the highest scorer. Percentile of A would be as follows:

Percentile of A = {^{Number of people who scored less than Mr A}⁄_{Total number of people}} × 100

⇒ = {⁴⁄₅} × 100

⇒ = 80% Answer.

And second thing you must remember about percentile, is that if the total number of students increases, the percentile would become more and more accurate. As you see in above quesiton, the highest scorer has percentile 80% which is not accurate. But if total number of students increases to 1000 or even more, then you’ll see the highest scorer percentile to be 99.99%. Which is accurate and realistic.

Also remember, percentile can never be 100%.

Normal Distribution Curve:

Normal distribution curve are one of the most important part of Statistics in GMAT and GRE exams.

In a question if you see words like ‘data is normally distributed’ or ‘normally distributed set’ etc always draw a curve to solve the question. In normal distribution questions, mostly percentile, new standard deviation, mean and median etc are asked. The normal distribution curve will make us quite easy to solve such questions. Let’s learn it in different scenarios.

Stats 1

Where

σ = Standard Deviation of all data set (Let’s suppose)

And

µ = Mean of all data set (Let’s suppose)

The horizontal line which contains signs of µ and σ is basically a number-line, which increases while moving from left to right with constant intervals.

You should remember the percentage distribution of data in each fragment of the normal distribution curve. 34% of overall data exists between Mean and 1 standard deviation above the mean. Similarly, 34% of overall data of a set lies between Mean and 1 standard deviation below the mean. And so on.

Also remember that in a normal distribution curve, Mean = Median = Mode

Let’s understand this by using GMAT or GRE level questions.

In a Math class, test scores are normally distributed and standard deviation of the class scores is 2.5. What is the mean score of the Math class, if the student who got 84 percentile has score of 162 out of 170.

Solution:

If you start counting from left to right only percentage of data i.e 2% + 14% + 34% + 34% = 84%

So the value of (µ + σ) on that point has 84 percentile. As we’ve learned that percentile shows percent of data exists below that point. So 84% of data exists below point (µ + σ) in normal distribution curve above. Therefore this point must be 162 according to the information.

⇒ µ + σ = Mean + S.D = 162

While, it’s given that S.D = 2.5

⇒ Mean = 162 – 2.5 = 159.5 Answer.

Very Important Point:

Remember that in a normal distribution curve, for instance, Mean + 1.5(S.D) = µ + 1.5σ

This point which exists at exactly middle of (µ + σ) and (µ + 2σ).

And the percentile of (µ + σ) is 84% and (µ + 2σ) is 98%, but the percentile of (µ + 1.5σ) would not be the middle of 84% and 98%. The reason is explained below:

Stats 2

You can see that the midpoint doesn’t make the same mid of percentile, because the wave like graph is not constant as the horizontal line below the graph. So the region with dotted lines have more area than region without dotted line from 84th to 98th percentile region in the normal distribution curve. That is very important point as well.

Interquartile Range:

Interquartile range is rarely asked in GMAT and GRE exams, but rarely doesn’t mean never. So let’s learn this.

Suppose a set below, whose interquartile range is required.

{2, 2, 3, 4, 4, 6, 6, 8, 12, 12, 12, 15, 15, 16, 18}

what is interquartile range of this set?

Solution:

First, divide the set in two parts of equal number of elements as below:

{(2, 2, 3, 4, 4, 6, 6), 8, (12, 12, 12, 15, 15, 16, 18)}

As number of elements are odd (i.e 15) so, the median (i.e middle element) will remain out of the two parts to ensure equal number of elements in two parts.

Now, further split the two parts into four parts (i.e quarter of set) of equal elements each as below:

{(2, 2, 3), 4, (4, 6, 6), 8, (12, 12, 12), 15, (15, 16, 18)}

Here,

First Quartile (Q₁) = 4

Second Quartile (Q₂) = 8

Third Quartile (Q₃) = 15

And, Fourth Quartile is always the last number of the set i.e

Fourth Quartile (Q₄) = 18

Remember, in case of even number of elements in the set, Suppose the above set doesn't has 18 i.e

{2, 2, 3, 4, 4, 6, 6, 8, 12, 12, 12, 15, 15, 16}

Now here  we can split this set in two and then in four subsets as folows:

{(2, 2, 3, 4, 4, 6, 6), (8, 12, 12, 12, 15, 15, 16)}

{(2, 2, 3), 4, (4, 6, 6), (8, 12, 12), 12, (15, 15, 16)}

Now, here
Q₁ = 4

But Q₂ doesn't exists (i.e no middle value). In that case you should take average of 6 and 8 to find median, which will come 
to be 7.

Now, remember that Interquartile Range = Q₃ – Q₁

So,

I.Q.R = Q₃ – Q₁ 15 – 4 = 11 Answer.

Cumulative Frequency:

As we already discussed little about Cumulative frequency. Let’s lean this in more clear way.

In stats, if you studied in your graduation course, this topic is frequently used. If you did not study stats, let’s learn this.

Cumulative frequency is a term used to say at a certain point in data set, what is the sum of frequency till that point.

For instance,

Number of Students	Daily Pocket money
3	200
5	300
8	500
14	1000

In this table, the cumulative frequency for data 300 is 3 + 5, which equals to 8. Similarly cumulative frequency till data 500 is 3 + 5 + 8 = 16 and so on. That’s it. So if in question, you see this term, don’t confuse it’s just as simple as it’s explained here.

Frequency Variance:

One of my student has encountered this term in his GRE exam. So you must also be aware about this.

Let’s consider the set as below:

{1, 3, 9, 11, 14, 22}

First we need to find mean of this set, which will be Sum divided by count.

⇒ Mean = 10

Now, take sum of positive difference (i.e Modulous denoted by “| |”) of each element from the mean and make square of each difference as below:

|10 – 1|² + |10 – 3|² + |10 – 9|² + |10 – 11|² + |10 – 14|² + |10 – 22|²

⇒ 9² + 7² + 1² + 1² + 3² + 12²

⇒ 81 + 49 + 1 + 1 + 9 + 144 = 285

Now, divide this by number of elements in the set (i.e 6)

⇒ Frequency Variance = ²⁸⁵⁄₆ ≈ 47.5 Answer.