Box Plot: It is a type of chart that depicts a group of numerical data through their quartiles. It is a simple way to visualize the shape of our data. It makes comparing characteristics of data between categories very easy. Show In this article, we are going to discuss the following topics-
Let’s proceed further step by step 1) Understanding the components of a box plot A box plot gives a five-number summary of a set of data which is-
Note: The box plot shown in the above diagram is a perfect plot with no skewness. The plots can have skewness and the median might not be at the center of the box. The area inside the box (50% of the data) is known as the Inter Quartile Range. The IQR is calculated as – IQR = Q3-Q1 Outliers are the data points below and above the lower and upper limit. The lower and upper limit is calculated as – Lower Limit = Q1 - 1.5*IQR Upper Limit = Q3 + 1.5*IQR The values below and above these limits are considered outliers and the minimum and maximum values are calculated from the points which lie under the lower and upper limit. 2) How to create a box plot Let us take a sample data to understand how to create a box plot. Here are the runs scored by a cricket team in a league of 12 matches – 100,120,110,150,110,140,130,170,120,220,140,110. To draw a box plot for the given data first we need to arrange the data in ascending order and then find the minimum, first quartile, median, third quartile and the maximum. Ascending Order - 100,110,110,110,120,120,130,140,140,150,170,220 Median (Q2) = (120+130)/2 = 125 ; Since there were even values To find the First Quartile we take the first six values and find their median. Q1 = (110+110)/2 = 110 For the Third Quartile, we take the next six and find their median. Q3 = (140+150)/2 = 145 Note: If the total number of values is odd then we exclude the Median while calculating Q1 and Q3. Here since there were two central values we included them. Now, we need to calculate the Inter Quartile Range. IQR = Q3-Q1 = 145-110 = 35 We can now calculate the Upper and Lower Limits to find the minimum and maximum values and also the outliers if any. Lower Limit = Q1-1.5*IQR = 110-1.5*35 = 57.5 Upper Limit = Q3+1.5*IQR = 145+1.5*35 = 197.5 So the minimum and maximum between the range [57.5,197.5] for our given data are – Minimum = 100 Maximum = 170 The outliers which are outside this range are – Outliers = 220 Now we have all the information, so we can draw the box plot which is as below- We can see from the diagram that the Median is not exactly at the centre of the box and one whisker is longer than the other. We also have one Outlier. 3) Uses of a Box Plot
a) If the Median is at the center of the Box and the whiskers are almost the same on both the ends then the data is Normally Distributed. b) If the Median lies closer to the First Quartile and if the whisker at the lower end is shorter (as in the above example) then it has a Positive Skew (Right Skew). c) If the Median lies closer to the Third Quartile and if the whisker at the upper end is shorter then it has a Negative Skew (Left Skew).
4) How to compare box plots As we have discussed at the beginning of the article that box plots make comparing characteristics of data between categories very easy. Let us have a look at how we can compare different box plots and derive statistical conclusions from them. Let us take the below two plots as an example:-
This is all for Box Plots. Now you might have got the idea of Box Plots how to make them and how to derive information from them. For any queries do leave a comment down below. What is a data value that is numerically distant from most of the other data points in a data set?An outlier is any value that is numerically distant from most of the other data points in a set of data. We know that -86 is far below any of the other values in our data set. It is not uncommon to find an outlier in a data set.
What is fast data quizlet?fast data. the application of big data analytics to smaller data sets in near real or real time in order to solve a problem or create business value.
Which of the following is the correct definition of outlier?Definition of outliers. An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In a sense, this definition leaves it up to the analyst (or a consensus process) to decide what will be considered abnormal.
What is the process of identifying rare or unexpected items or events in a dataset that do not conform?Anomaly detection is the process of identifying unexpected items or events in data sets, which differ from the norm.
|