# Descriptive Statistics

## Characteristics of data

**Data is information; information is knowledge.**

Humans collect information on a day to day basis.

- Companies collect information about customer satisfaction, number of sales of a product per day, amount of items bought for a certain price, customer preferences, profits and earnings of the company, customer phones numbers, addresses, age and sex …
- Medical services and insurance companies collect information about patient's vital signs, medical history, insurance company, address, effect of prescriptions, among others. Doctors and pharmaceutical companies also make clinical trials to see if certain medicines and drugs are efficient and under what conditions.
- Governments collect information about the nation's population, age, gender, salary, range, …
- Scientists collect information on the conditions and the results of their experiments.
- Credit card companies and banks collect information about the customer's salary, the price of their houses, the number and amount of loans they have, their address, buying preferences, …
- Amazon(TM) collects information about what you buy through their website, your recent searches, your favorite items, … . When you log-in again, the items you are likely to buy appear on the bottom of the screen as suggestions.
- Google
- Google collects information about the searches you do from their website and adds sponsored links of websites where you could buy a related item. Google gets paid every time you click on those sponsored links.
- Gmail searches for keywords on your emails and adds advertisements on the one blue bar on top of your email.

In the world we live today, * information is everywhere*. With such a wealth of information, one sometimes can't see the forest for the trees.

For that reason, it is very important to organize information (data).

Some important measures to summarize information are:

**Center:**a measure of the middle or average value. For example, when opening a store, the owner might be interested in the average age of the people that walk by, in order to cater his products to that clientele.**Variation:**measures the amount that data values vary. For example, in a clinical trial, one might be interested in how much varied the responses are to a given drug. If the responses vary too much, then the drug would most likely not be accepted by the Food and Drug Administration.**Distribution:**nature or shape of distribution the of data. For example, when opening a teen-age clothing store, the distribution of the people walking by is important as well. From the two graphical distributions given below,

the walk-by population on the red distribution is better for their business than the one in the blue distribution. Why?

**Outliers:**values that are very far out**Time:**changing characteristics of data in time

## Data

### Frequency distributions

Here is an example of a frequency distribution, which is really just a table where you tally how many items fall in a certain category.

Cotinine Level | # of smokers |
---|---|

0-99 | 11 |

100-199 | 12 |

200-299 | 14 |

300-399 | 1 |

400-499 | 2 |

500-599 | 0 |

Some definitions are useful:

**Lower class limits**: 0,100,…

**Upper class limits**: 99,…

**Class boundaries**: numbers used to separate classes without gaps; 99.5, 199.5,…

**Class midpoints**: center of class; 49.5, 149.5, …

**Class width**: diference between two consecutive lower (or upper) class limits: 100

### Constructing frequency distribution

Decide on number of classes n : 5-20

Class width =(highest value-lowest value)/n

Starting point: lowest data value of convenient lowest value (smaller)

List lower class limits

List upper class limits

Tally data: count the data values falling in each class

Cummulative Frequency distribution

Visualizing data

Histogram

A histogram is a bar graph in which the horizontal scale represents classes of data values and the vertical scale represents frequencies

(Relative) frequency histograms, polygons and ojives

Other ways of representing data

Dot plot: find out what this is!

Stem-and-leaf plot

keep track of all your data

only works in certain specific cases

condensed stem-and-leaf plot

…and more ways of representing data

Napoleon’s campaign chart 1812

Class sheet page 1

Organizing andSummarizing DataSummary

Summaries of qualitative data

Frequency tables

Bar graphs

Summaries of quantitative data

Frequency tables

Histograms

Pie graphs, time-series graphs, etc.

Cumulative frequencies, ogives, etc.