# Introduction

Descriptive Statistics describe basic features of the data gathered from an experimental study in various ways.

They provide simple summaries about the sample via graphs and numbers, mainly measures of center and variation.

Together with graphics analysis (histograms, bar plots, pie-charts), they are the cornerstone of quantitative data analysis.

- Tables (frequency distributions, stem-and-leaf plots, …) that summarize the data.
- Graphical representations of the data (histograms, bar plots, pie-charts).
- Summary statistics (numbers) which summarize the data.

# Tables

The most common ways of summarizing data into tables are frequency distribution, relative frequency distribution and relative frequency distribution tables. Another common format is using a stem-and-leaf plot.

## Frequency distribution table

A **frequency distribution** summarizes the data into a table containing ranges where the data falls, and the frequency (or amount) of data that fall in that range.

For example, the frequency distribution of the GDP per capita in the 20 countries with the highest such gross domestic product per person is given below:

GDP range | Number of countries |
---|---|

80,000 + | 1 |

70,000-79,999 | 2 |

60,000-69,999 | 4 |

50,000-59,999 | 0 |

40,000-49,999 | 8 |

- 40,000 | 5 |

This means that among those countries, 5 have a GDP per capita of 40,000 dollars or less, 8 have a GDP per capita of between 40,000 and 49,999 dollars, and so on and so on.

Notice that such a table is already a summary, since we do not know exactly what the GDP of each country is, we only know that it is in the given range.

Another way of summarizing data in which we loose less information is using a stem-and-leaf plot.

## Stem-and-Leaf Plots

Another way of summarizing data into tables is the stem-and-leaf plot, which works best when the data can be subdivided into tens and units. Suppose that the GDP per capita from the example above is: 81 76 70 57 55 55 89 46 46 45 44 44 42 40 39 39 39 39 39, which is given in thousands of dollars. In other words, one country has a GDP per capita of 81,000 dollars (namely Luxemburg), Qatar has a GDP per capita of 76,000, and for example the USA have a GDP per capita of 46,000 dollars. This data can be nicely summarized in the following stem-and-leaf-plot:

In the stem-and-leaf plot above, **8|1** means that there is one county with GDP per person in the 80,000s, and that that county has a GDP of 81,000 per capita.

**7|06** means that there are 2 countries with GDP between $70,000 and $79,999, and that those countries have GDP per capita of 7**0**,000 and 7**6**,000. That is why there is a 0 and a 6 in the units are, to the right of the vertical bar.

# Graphical representations

Examples of graphical representations are the histogram, bar plot, dot plot and pie chart.

## Histogram

A histogram is a vertical bar chart, where the values heights represent the frequencies (also called counts). For the example about the GDP per capita, for which we made a frequency distribution above, the histogram would look like this:

Note that there is no space in between the bars, and that the height of the bars represent the frequencies, while the width of the bars depends on the class range selected in the frequency distribution.

## Frequency Polygon

A frequency poligon is another way of graphically representing the data, where instead of using bars, we connect the given heights by a straight line, as in the example below.

Notice that this frequency polygon represents the same data given in the histogram above.

## Cummulative frequency distributions and Ojives

A cummulative frequency table is obtained by counting how many items fall below a certain boundary. For example, from the frequency table above, we can create a new table, called a cummulative frequency table:

GDP range | Number of countries |
---|---|

less than 40,000 | 5 |

less than 50,000 | 13 |

less than 60,000 | 17 |

less than 70,000 | 17 |

less than 80,000 | 19 |

less than 90,000 | 20 |

Which we can then use to create a cummulative frequency polygon, called an **ojive**.

## Dot plot

There are several kinds of dot plots and dot charts. But we will concentrate on one that is similar to a histogram, but instead of bars, dots are piled on top of each other, as can be seen below:

## Pie Chart

A pie chart is a useful graphical representation for qualitative data.

For example, the US Census Bureau compiled the following chart for single mothers.

Using this qualitative chart, we can create a pie chart by dividing the value in every category by the total to convert to percent, and then multiplying by 360 degrees, to find the angle of the piece of the pie corresponding to that category.

# Summary statistics

In general, statistical data can be briefly described as a list of subjects or units and the data associated with each of them. Although most research uses many data types for each unit, this introduction treats only the simplest case.

There may be two objectives for formulating a summary statistic:

1. To choose a statistic that shows how different units seem similar. Statistical textbooks call one solution to this objective, a measure of central tendency.

2. To choose another statistic that shows how they differ. This kind of statistic is often called a measure of statistical variability.

When summarizing a quantity like length or weight or age, it is common to answer the first question with the arithmetic mean, the median, or, in case of a unimodal distribution, the mode. Sometimes, we choose specific values from the cumulative distribution function called quantiles.

The most common measures of variability for quantitative data are the variance; its square root, the standard deviation; the range; interquartile range; and the average absolute deviation (average deviation).