Which measure of central tendency would be a better choice to use if a data set has some extreme values Why?

Measures of central tendency are numbers that indicate the centre of a set of ordered numerical data.

The three common measures of central tendency are the mean, the median and the mode.

The mean gives each element of a data set equal weight. When there are no extreme numbers in the data set (no very low or very high numbers), the mean is a good choice for a measure of central tendency. Statisticians state that "the mean is the most unbiased measure of central tendency".

The median gives the greatest weight to elements in the middle of the ordered data. When there are extreme numbers in the data set (very low or very high numbers), the median is a good choice for a measure of central tendency. The extreme numbers have less effect (or no effect at all) on the median.

The mode is a good choice for a measure of central tendency when the data has many identical data values.

The data below are the hourly sales of songs for an on-line music store over a ten hour period.

RAW DATA: { 11, 10, 13, 15, 73, 69, 67, 66, 14, 12 }

ORDERED DATA: { 10, 11, 12, 13, 14, 15, 66, 67, 69, 73 }

Mean: 35

Median: 14.5

Mode: There is no mode.

Given the way the data is distributed in this example (with many small and many large numbers), the arithmetic mean is probably the most appropriate measure of central tendency.

The mean number of songs sold at an on-line music store over a ten hour period is 35. [Open a demonstration(with the data of this example pre-entered).]

The data below are the yearly wages (in dollars) of ten people working at an on-line music store.

DATA: { 41 000, 41 000, 41 000, 41 000, 43 000, 45 000, 48 000, 50 000, 50 000, 250 000 }

Mean: 65 000

Median: 44 000

Mode: 41 000

Given the way the data is distributed in this example (with one persons yearly wage being so large), the median is probably the best measure of central tendency.

NOTE: Nine people are below the mean and one person is above the mean, so the mean is probably not the most appropriate measure of central tendency.

NOTE: The majority of people working at the store (four in this case) are new employees who earn "starting wages". The mode, therefore, is probably not the most appropriate measure of central tendency.

The median yearly wage of ten people working at an on-line music store is $44 000.00. [Open a demonstration (with the data of this example pre-entered).]

The data below are the seventeen shoe sizes of one type of shoe sold in one day at a local shoe store.

DATA: { 5, 6, 7, 7, 7, 7, 7, 7, 8, 9, 9, 10, 11, 12, 13, 13, 15 }

Mean: 9

Median: 8

Mode: 7

Given the way the data is distributed in this example (with so many size seven shoes being sold), the mode is probably the best measure of central tendency.

The mode shoe size of one type of shoe sold at a local shoe store is size seven. [Open a demonstration (with the data of this example pre-entered).]

What happens to the mean and median if we add or multiply each observation in a data set by a constant?

Consider for example if an instructor curves an exam by adding five points to each student’s score. What effect does this have on the mean and the median? The result of adding a constant to each value has the intended effect of altering the mean and median by the constant.

For example, if in the above example where we have 10 aptitude scores, if 5 was added to each score the mean of this new data set would be 87.1 (the original mean of 82.1 plus 5) and the new median would be 86 (the original median of 81 plus 5).

Similarly, if each observed data value was multiplied by a constant, the new mean and median would change by a factor of this constant. Returning to the 10 aptitude scores, if all of the original scores were doubled, the then the new mean and new median would be double the original mean and median. As we will learn shortly, the effect is not the same on the variance!

Looking Ahead!

Why would you want to know this? One reason, especially for those moving onward to more applied statistics (e.g. Regression, ANOVA), is the transforming data. For many applied statistical methods, a required assumption is that the data is normal, or very near bell-shaped. When the data is not normal, statisticians will transform the data using numerous techniques e.g. logarithmic transformation. We just need to remember the original data was transformed!!

Shape

The shape of the data helps us to determine the most appropriate measure of central tendency. The three most important descriptions of shape are Symmetric, Left-skewed, and Right-skewed. Skewness is a measure of the degree of asymmetry of the distribution.

Symmetric

  • mean, median, and mode are all the same here
  • no skewness is apparent
  • the distribution is described as symmetric
A symmetrical distribution.

Mean = Median = Mode Symmetrical

Left-Skewed or Skewed Left

  • mean < median
  • long tail on the left
A left skewed distribution.

Median Mean Mode Skewed to the left

Right-skewed or Skewed Right

  • mean > median
  • long tail on the right
A right skewed distribution.

Median Mean Mode Skewed to the right

Note! When one has very skewed data, it is better to use the median as measure of central tendency since the median is not much affected by extreme values.

What measure of central tendency is best used if there are extreme values?

The median is less affected by outliers and skewed data than the mean, and is usually the preferred measure of central tendency when the distribution is not symmetrical.

Which measure of central tendency would be a better choice to use if a data set has some extreme values?

What is the most appropriate measure of central tendency when the data has outliers? The median is usually preferred in these situations because the value of the mean can be distorted by the outliers.

Which measures of central tendency is best and why?

Mean is generally considered the best measure of central tendency and the most frequently used one. However, there are some situations where the other measures of central tendency are preferred. There are few extreme scores in the distribution. Some scores have undetermined values.

What is the best measure of central tendency to describe your data?

Mean is the most frequently used measure of central tendency and generally considered the best measure of it. However, there are some situations where either median or mode are preferred. Median is the preferred measure of central tendency when: There are a few extreme scores in the distribution of the data.