Some rules of probability


An random experiment is the process by which an observation is observed. It is also called a procedure. The word random indicates that the outcome of the experiment cannot be known in advance.


  • tossing a coin and observing the outcome
  • tossing a die and observing the outcome
  • measuring daily rainfall
  • recording a test grade

A simple event is the outcome that is observed on a single repetition of a random experiment.


  • when tossing a coin, the simple events are: Heads or Tails
  • when tossing a die, they are 1, 2, 3, 4, 5, or 6
  • when tossing two dice, one simple event is 1-1, another different one is 1-2 and a third one is 1-2. Note that there are 36 such simple events

An event is a collection of simple events.


  • when tossing a die, an event is the outcome is event, another one, is the outcome is larger than 4. The latter consists of the simple events 5 and 6.
  • when tossing two dice, one event is "the sum of the faces is equal to 5", which consists of the following simple events: 1-4, 2-3, 3-2, and 4-1.

Two events are mutually exclusive if, when one event occurs, the other one cannot occur, and vice versa.


  • when tossing one die, if A="the outcome is odd", and B="the outcome is even", then A and B are mutually exclusive.

The set of all simple events is called the sample space, which is sometimes denoted by S and sometimes by Omega $\Omega$.

Event relations and Venn diagrams

A Venn diagram is a useful way to represent events graphically. The sample space is represented by the encompassing rectangle, while the events are usually circular and smaller than the whole sample space, as in the example below.

The union of the events A and B, denoted by A U B, is the event that either A or B or both occur. This can be seen in the Venn diagram below:


The intersection of the events A and B, denoted by $A\cap B$, is the event that both A and B occur. The graphical representation of this event is given below:



the complement of an event A, sometimes denoted by A', sometimes by AC, and sometimes by $\bar{A}$ is the event that A does not occur, and is depicted in the shaded area depicted below:


These definitions help us to find the following rules for calculating probabilities of unions and intersections.

NOTE: Section 5.4 on odds will not be covered.

The addition rule

The probability of the union of two events, P(A U B), is given by

\begin{align} P(A\cup B) = P(A) + P(B) -P(A\cap B) \end{align}

Example: When drawing one card out of a deck of 52 playing cards, what is the probability of getting a face card (king, queen or jack) or a heart?

Let H denote drawing a heart and F denote drawing a face card, since there are 13 hearts and a total of 12 face cards (3 of each suit - spades, hearts, diamonds and clubs), but only 3 face cards of hearts, we obtain

  • P(H) = 13/52
  • P(F) = 12/52
  • P(F$\cap$H) = 3/52

and using the addition rule, we get

\begin{align} P(H\cup F) = P(H)+P(F)-P(H\cap F) = \frac{13}{52}+\frac{12}{52}-\frac{3}{52}. \end{align}

The reason for the subtracting the last term is that otherwise we would be counting that middle section twice (in case A and B overlap). If A and B are mutually exclusive (also called disjoint, since they do not overlap), then the latter probability is zero.

Addition rule for disjoint events

Therefore, when A and B are mutually exclusive, then $P(A\cap B)=0$ and

\begin{align} P(A\cup B) = P(A) + P(B), \ \ \ \ \mbox{when } A\cap B = \emptyset. \end{align}

The symbol $\emptyset$ represents the empty set, which means that in this case A and B do not have any elements in common (do not overlap).

An extension of this rule for disjoint events says that, when $A_1, A_2, ..., A_k$ are disjoint events, then

\begin{align} P(A_1\cup A_2\cup\dots\cup A_k) = \sum_{i=1}^k A_i= P(A_1) + P(A_2)+\dots+P(A_k), \end{align}

Conditional Probability

The conditional probability of the event A relative to the sample space S (also called conditional probability of A given S) is denoted by P(A|S). It specifies the sample space for which we are interested to calculate the probability.

Example: Consider the following example about the effectiveness of a pregnancy test which shows a Y when the test is positive and an X if the test is negative, in which 150 subjects were checked for the effectiveness of the test.

Test positive (Y) Test negative (X) Totals
Subject Pregnant 105 15 120
Subject Not Pregnant 10 20 30
Totals 115 35 150

If one subject in the experiment is randomly selected (for which the probability is 1/150), we find that the probability that the subject is pregnant (denoted by p) is

\begin{align} P(p) = \frac{120}{150} = 0.80, \end{align}

since there is a total of 120 subjects that are pregnant; 105 of whom tested positive (and actually are pregnant) - good test result, called true positive **- and 5 of whom tested negative (but are pregnant) - bad test result, called **false negative.

Meanwhile, the probability of a subject testing positive, denoted Y as the test would indicate, is given by

\begin{align} P(Y) = \frac{115}{150} = 0.77 \end{align}

since there is a total of 115 subjects who tested positive, the 105 mentioned above that are pregnant and tested positive and 10 more who tested positive but are not pregnant - bad test result called false positive.

The last category in the table represents the 20 subjects who are not pregnant, and which tested negative. This is a good test result called a true negative.

The probability of choosing a subject with a true positive result (one who is pregnant and tested positive) is given by

\begin{align} P(p\capY) = \frac{105}{150} = 0.70 \end{align}

All these probabilities are assuming that selecting any subject is equally likely, so in a sense we are using the classical approach to probability, even though the results come from an experiment, hence we are using a relative frequency approach.

An interesting question is what is the probability of a person testing positive, given that the person is actually pregnant. That is actually a conditional probability given by

\begin{align} P(Y | p ) = \frac{105}{120} = 0.875, \end{align}

which is actually higher than the probability of testing positive $P(Y)=0.8$ in the sample, as might be expected.

Note that this conditional probability can also be obtained as follows:

\begin{align} P(Y | p ) = \frac{ \frac{105}{150}}{\frac{120}{150}} = \frac{P(Y \cap p)}{P(p)} \end{align}

which is the ratio of the probability of choosing a subject who is pregnant and tested positive to the probability of choosing a subject who is pregnant.

In general, the conditional probability can then be defined as follows:

If $P(B)\neq 0$ then the conditional probability of A relative to B is given by

\begin{align} P(A | B ) = \frac{P(A \cap B)}{P(B)} \end{align}


\begin{equation} P(A | B ) = P(A) \end{equation}

then we say that A and B are independent events.

Multiplication rules

Multiplying both sides of the definition of conditional probability by the denominator we obtain the general multiplication rule

\begin{align} P(A\cap B) = P(B)P(A | B ) \end{align}

which can be written alternatively as:

\begin{align} P(A\cap B) = P(A)P(B | A ) \end{align}

In words, the lower definition is saying that

the probability that A and B occur is equal to the probability that A occurs times the probability that B occurs, given that we know A occurred already.

Example: Suppose that we draw two cards out of a deck of cards and let A={first card is an ace}, B = {second card is an ace}, then

\begin{align} P(A) = \frac{4}{52} \end{align}


\begin{align} P(B|A) = \frac{3}{51} \end{align}

since we know a card has been drawn already, so there is 51 left in total, and we also know the first card was an ace, therefore:

\begin{align} P(A\cap B) = P(A)P(B | A ) = \frac{4}{52}\ \frac{3}{51} = 0.0045 \end{align}

Bayes Rule

The general product rule can be written in two ways:

\begin{align} P(A\cap B) = P(A)P(B | A ) \end{align}


\begin{align} P(B\cap A) = P(B)P(A | B ). \end{align}

But since the left side of these expressions is equal, so is the right side.


\begin{equation} P(A)P(B | A )=P(B)P(A | B ) , \end{equation}

and dividing both sides by P(A) we obtain

\begin{align} P(B | A )=\frac{P(B)P(A | B )}{P(A)} , \end{align}

which is one form of Bayes rule.

A generalization when the sample space can be divided into the mutually exclusive union of events $B_1,B_2, \dots, B_k$ is given by

\begin{align} P(B_i | A )=\frac{P(B_i)P(A | B_i )}{\sum\limits_{j=1}^kP(B_j)P(A | B_j )} = \frac{P(B_i)P(A | B_i )}{P(B_1)P(A | B_1 )+\dots+P(B_k)P(A | B_k )} \end{align}
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License