Organizing Data

Descriptive Statistics

Characteristics of data

Data is information; information is knowledge.

Humans collect information on a day to day basis.

  • Companies collect information about customer satisfaction, number of sales of a product per day, amount of items bought for a certain price, customer preferences, profits and earnings of the company, customer phones numbers, addresses, age and sex …
  • Medical services and insurance companies collect information about patient's vital signs, medical history, insurance company, address, effect of prescriptions, among others. Doctors and pharmaceutical companies also make clinical trials to see if certain medicines and drugs are efficient and under what conditions.
  • Governments collect information about the nation's population, age, gender, salary, range, …
  • Scientists collect information on the conditions and the results of their experiments.
  • Credit card companies and banks collect information about the customer's salary, the price of their houses, the number and amount of loans they have, their address, buying preferences, …
  • Amazon(TM) collects information about what you buy through their website, your recent searches, your favorite items, … . When you log-in again, the items you are likely to buy appear on the bottom of the screen as suggestions.
  • Google
    • Google collects information about the searches you do from their website and adds sponsored links of websites where you could buy a related item. Google gets paid every time you click on those sponsored links.
    • Gmail searches for keywords on your emails and adds advertisements on the one blue bar on top of your email.

In the world we live today, information is everywhere. With such a wealth of information, one sometimes can't see the forest for the trees.
For that reason, it is very important to organize information (data).

Some important measures to summarize information are:

  • Center: a measure of the middle or average value. For example, when opening a store, the owner might be interested in the average age of the people that walk by, in order to cater his products to that clientele.
  • Variation: measures the amount that data values vary. For example, in a clinical trial, one might be interested in how much varied the responses are to a given drug. If the responses vary too much, then the drug would most likely not be accepted by the Food and Drug Administration.
  • Distribution: nature or shape of distribution the of data. For example, when opening a teen-age clothing store, the distribution of the people walking by is important as well. From the two graphical distributions given below, flickr:2968998009

the walk-by population on the red distribution is better for their business than the one in the blue distribution. Why?

  • Outliers: values that are very far out
  • Time: changing characteristics of data in time


Frequency distributions

Here is an example of a frequency distribution, which is really just a table where you tally how many items fall in a certain category.

Cotinine Level # of smokers
0-99 11
100-199 12
200-299 14
300-399 1
400-499 2
500-599 0

Some definitions are useful:
Lower class limits: 0,100,…
Upper class limits: 99,…
Class boundaries: numbers used to separate classes without gaps; 99.5, 199.5,…
Class midpoints: center of class; 49.5, 149.5, …
Class width: diference between two consecutive lower (or upper) class limits: 100

Constructing frequency distribution

Decide on number of classes n : 5-20
Class width =(highest value-lowest value)/n
Starting point: lowest data value of convenient lowest value (smaller)
List lower class limits
List upper class limits
Tally data: count the data values falling in each class
Cummulative Frequency distribution
Visualizing data
A histogram is a bar graph in which the horizontal scale represents classes of data values and the vertical scale represents frequencies

(Relative) frequency histograms, polygons and ojives
Other ways of representing data
Dot plot: find out what this is!
Stem-and-leaf plot
keep track of all your data
only works in certain specific cases
condensed stem-and-leaf plot

…and more ways of representing data
Napoleon’s campaign chart 1812
Class sheet page 1
Organizing andSummarizing DataSummary
Summaries of qualitative data
Frequency tables
Bar graphs
Summaries of quantitative data
Frequency tables
Pie graphs, time-series graphs, etc.
Cumulative frequencies, ogives, etc.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License