Organizing Data

# Descriptive Statistics

## Characteristics of data

Data is information; information is knowledge.

Humans collect information on a day to day basis.

• Companies collect information about customer satisfaction, number of sales of a product per day, amount of items bought for a certain price, customer preferences, profits and earnings of the company, customer phones numbers, addresses, age and sex …
• Medical services and insurance companies collect information about patient's vital signs, medical history, insurance company, address, effect of prescriptions, among others. Doctors and pharmaceutical companies also make clinical trials to see if certain medicines and drugs are efficient and under what conditions.
• Governments collect information about the nation's population, age, gender, salary, range, …
• Scientists collect information on the conditions and the results of their experiments.
• Credit card companies and banks collect information about the customer's salary, the price of their houses, the number and amount of loans they have, their address, buying preferences, …
• Amazon(TM) collects information about what you buy through their website, your recent searches, your favorite items, … . When you log-in again, the items you are likely to buy appear on the bottom of the screen as suggestions.

In the world we live today, information is everywhere. With such a wealth of information, one sometimes can't see the forest for the trees.
For that reason, it is very important to organize information (data).

Some important measures to summarize information are:

• Center: a measure of the middle or average value. For example, when opening a store, the owner might be interested in the average age of the people that walk by, in order to cater his products to that clientele.
• Variation: measures the amount that data values vary. For example, in a clinical trial, one might be interested in how much varied the responses are to a given drug. If the responses vary too much, then the drug would most likely not be accepted by the Food and Drug Administration.
• Distribution: nature or shape of distribution the of data. For example, when opening a teen-age clothing store, the distribution of the people walking by is important as well. From the two graphical distributions given below,

the walk-by population on the red distribution is better for their business than the one in the blue distribution. Why?

• Outliers: values that are very far out
• Time: changing characteristics of data in time

## Data

### Frequency distributions

Here is an example of a frequency distribution, which is really just a table where you tally how many items fall in a certain category.

Cotinine Level # of smokers
0-99 11
100-199 12
200-299 14
300-399 1
400-499 2
500-599 0

Some definitions are useful:
Lower class limits: 0,100,…
Upper class limits: 99,…
Class boundaries: numbers used to separate classes without gaps; 99.5, 199.5,…
Class midpoints: center of class; 49.5, 149.5, …
Class width: diference between two consecutive lower (or upper) class limits: 100

### Constructing frequency distribution

Decide on number of classes n : 5-20
Class width =(highest value-lowest value)/n
Starting point: lowest data value of convenient lowest value (smaller)
List lower class limits
List upper class limits
Tally data: count the data values falling in each class
Cummulative Frequency distribution
Visualizing data
Histogram
A histogram is a bar graph in which the horizontal scale represents classes of data values and the vertical scale represents frequencies

(Relative) frequency histograms, polygons and ojives
Other ways of representing data
Dot plot: find out what this is!
Stem-and-leaf plot
keep track of all your data
only works in certain specific cases
condensed stem-and-leaf plot

…and more ways of representing data
Napoleon’s campaign chart 1812
Class sheet page 1
Organizing andSummarizing DataSummary
Summaries of qualitative data
Frequency tables
Bar graphs
Summaries of quantitative data
Frequency tables
Histograms
Pie graphs, time-series graphs, etc.
Cumulative frequencies, ogives, etc.

page revision: 14, last edited: 07 Nov 2008 15:51