In what circumstances would you prefer to use the median instead of the mean for a set of data?

Learning Outcomes

  • Recognize, describe, and calculate the measures of the center of data: mean, median, and mode.

By now, everyone should know how to calculate mean, median and mode. They each give us a measure of Central Tendency (i.e. where the center of our data falls), but often give different answers. So how do we know when to use each? Here are some general rules:

  1.  Mean is the most frequently used measure of central tendency and generally considered the best measure of it. However, there are some situations where either median or mode are preferred.
  2. Median is the preferred measure of central tendency when:
    1.  There are a few extreme scores in the distribution of the data. (NOTE: Remember that a single outlier can have a great effect on the mean). b.
    2. There are some missing or undetermined values in your data. c.
    3. There is an open ended distribution (For example, if you have a data field which measures number of children and your options are [latex]0[/latex], [latex]1[/latex], [latex]2[/latex], [latex]3[/latex], [latex]4[/latex], [latex]5[/latex] or “[latex]6[/latex] or more,” than the “[latex]6[/latex] or more field” is open ended and makes calculating the mean impossible, since we do not know exact values for this field).
    4. You have data measured on an ordinal scale.
  3. Mode is the preferred measure when data are measured in a nominal ( and even sometimes ordinal) scale.

Definitions of mean and median

In mathematics and statistics, the mean or the arithmetic mean of a list of numbers is the sum of the entire list divided by the number of items in the list. When looking at symmetric distributions, the mean is probably the best measure to arrive at central tendency. In probability theory and statistics, a median is that number separating the higher half of a sample, a population, or a probability distribution, from the lower half.

How to calculate

The Mean or average is probably the most commonly used method of describing central tendency. A mean is computed by adding up all the values and dividing that score by the number of values. The arithmetic mean of a sample

In what circumstances would you prefer to use the median instead of the mean for a set of data?
is the sum the sampled values divided by the number of items in the sample:

In what circumstances would you prefer to use the median instead of the mean for a set of data?

The Median is the number found at the exact middle of the set of values. A median can be computed by listing all numbers in ascending order and then locating the number in the center of that distribution. This is applicable to an odd number list; in case of an even number of observations, there is no single middle value, so it is a usual practice to take the mean of the two middle values.

Example

Let us say that there are nine students in a class with the following scores on a test: 2, 4, 5, 7, 8, 10, 12, 13, 83. In this case the average score (or the mean) is the sum of all the scores divided by nine. This works out to 144/9 = 16. Note that even though 16 is the arithmetic average, it is distorted by the unusually high score of 83 compared to other scores. Almost all of the students' scores are below the average. Therefore, in this case the mean is not a good representative of the central tendency of this sample.

The median, on the other hand, is the value which is such that half the scores are above it and half the scores below. So in this example, the median is 8. There are four scores below and four above the value 8. So 8 represents the mid point or the central tendency of the sample.

In what circumstances would you prefer to use the median instead of the mean for a set of data?

In what circumstances would you prefer to use the median instead of the mean for a set of data?

Comparison of mean, median and mode of two log-normal distributions with different skewness.

Disadvantages of Arithmetic Means and Medians

Mean is not a robust statistic tool since it cannot be applied to all distributions but is easily the most widely used statistic tool to derive the central tendency. The reason that mean cannot be applied to all distributions is because it gets unduly impacted by values in the sample that are too small to too large.

The disadvantage of median is that it is difficult to handle theoretically. There is no easy mathematical formula to calculate the median.

Other Types of Means

There are many ways to determine the central tendency, or average, of a set of values. The mean discussed above is technically the arithmetic mean, and is the most commonly used statistic for average. There are other types of means:

Geometric Mean

The geometric mean is defined as the nth root of the product of n numbers, i.e., for a set of numbers x1,x2,...,xn, the geometric mean is defined as

In what circumstances would you prefer to use the median instead of the mean for a set of data?

Geometric means are better than arithmetic means for describing proportional growth. For example, a good application for geometric mean is calculating the compounded annual growth rate (CAGR).

Harmonic Mean

The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals. The harmonic mean H of the positive real numbers x1,x2,...,xn is

In what circumstances would you prefer to use the median instead of the mean for a set of data?

A good application for harmonic means is when averaging multiples. For exampe, it is better to use weighted harmonic mean when calculating the average price–earnings ratio (P/E). If P/E ratios are averaged using a weighted arithmetic mean, high data points get unduly greater weights than low data points.

Pythagorean Means

The arithmetic mean, geometric mean and harmonic mean together form a set of means called the Pythagorean means. For any set of numbers, the harmonic mean is always the smallest of all Pythagorean means, and the arithmetic mean is always the largest of the 3 means. i.e. Harmonic mean ≤ Geometric mean ≤ Arithmetic mean.

Other meanings of the words

Mean can be used as a figure of speech and holds a literary reference. It is also used to imply poor or not being great. Median, in a geometric reference, is a straight line passing from a point in the triangle to the centre of the opposite side.

References

  • wikipedia:Mean
  • wikipedia:Median
  • Modes, Medians and Means: A Unifying Perspective
  • Pythagorean means

When would we prefer the median instead of the mean?

The answer is simple. If your data contains outliers such as the 1000 in our example, then you would typically rather use the median because otherwise the value of the mean would be dominated by the outliers rather than the typical values.

Under what condition might you prefer to use the median rather than the mean as the best measure of central tendency?

The median is less affected by outliers and skewed data than the mean, and is usually the preferred measure of central tendency when the distribution is not symmetrical.

When might we want to use the median instead of the mean to better represent the central tendency of a set of scores?

For normally distributed data, all three measures of central tendency will give you the same answer so they can all be used. In skewed distributions, the median is the best measure because it is unaffected by extreme outliers or non-symmetric distributions of scores. The mean and mode can vary in skewed distributions.