Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
輔仁大學 96 學年度課程綱要表
開課系所 名稱及代碼
電 子 工 程 學 系 二年級
必修
選修
上
下 開課科目代碼
D-5002-02802
選
別
學分數 3
課 程 名 稱 (中英文並列)
Fundamentals of Probability 機率學
任 課 教 授 (中英文並列)
Yu, Jung-Lang 余金郎
課程目標
The aim of this course is to present probability in a natural way through interesting and instructive examples and exercises that motivates the theory, definitions, theorems and methodology. For the students to: (1) Acquire an understanding of the basic concepts of probability. (2) Become interested in the theory of probability. (3) Improve their problem solving skills and develop their proof writing
techniques
課程綱要
1. Axioms of Probability. (0.5week)
2. Combinatorial Methods. (0.5W)
3. Conditional Probability and Independence. (2.0W)
4. Distribution Functions and Discrete Random Variables. (2.0W)
5. Special Discrete Variables. (1.0W)
6. Continuous Random Distributions. (1.0W)
7. Special Continuous Distributions. (2.0W)
8.Joint Distributions. (2.5W)
9. More Expectations and Variances. (1.5W)
10. Sums of Independent RVs and Limit Theorems. (1.0W) 先 修 課 程 None 建議續修科目 Signal and System,
指定教材 (用書、器材)
S. Ghahramani, ‘Fundamentals of Probability’, 3rd Edition, 2005 (新月)
參考書目
R.E. Ziemer, ‘Elements of engineering probability and statistics’, 2ndEdition, 1997 (全華) P.L. Meyer, ‘Introductory probability and statistical applications’, 2ndEdition, 1970 (中央)
備註 Quizs 40%, Midterm 25% Final 35%
Probability Theory. Chapter 0: Introduction 1/4
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
Introduction to Probability Theory
Instructor: Dr Jung-Lang Yu(余金郎). Office:SF709, Tel: 29052102
Lectures: Mon. 08:10-09:00, room SF338, Tue. 10:10-12:00, room SF132.
Office hours: Thu. 14:00-16:00, Fri. 14:00-16:00, room SF709 (by appointment if necessary).
E-mail: [email protected]
Textbook: Fundamentals of Probability by Saeed Ghahramani, 3rd edition, Prentice-
Hall, Inc. The course will cover some but not all of the material.
Material Download: http://www.ee.fju.edu.tw/communication/main.html
Prerequisites: Calculus.
Course description: Basic notions of probability, discrete and continuous probability distributions, expectation and conditional expectation, independence, limit theorems and notions of convergence.
My role in this course is to help you learn and understand the material. Please come and see me in my office hours immediately if you have troubles with the material, so we can intervene early. Mathematics is not a selection of disassociated pieces, but the parts build upon each other. Mastering mathematics is not very different from mastering a sport. You need to practice to get the moves right, and then you need to understand the dynamics to put the different moves together for a coherent strategy. Your part consists of
♦ doing the homework (math is learned by doing, not by watching). It has been shown that Time-on-Task (TOT) (which consists of taking notes, re-reading the material after class, doing homework and studying) has the highest correlation with the final grade!!
♦ to ask if you do not understand. Asking does not mean that you are stupid, but rather that you are smart enough to realize that clarifying what bothers you is an important step in doing well in this (or any) class. For the majority of questions there will be many grateful classmates who had the same question, but did not have the guts to ask.
♦ to take advantage of my office hours. This time is set aside for you to come and ask questions. You can make the most of this time by coming prepared – bring your notes with marks on where you have questions, and show me where you get stuck in solving a particular problem. I can assist you best when you have specific questions.
Probability Theory. Chapter 0: Introduction 2/4
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
In terms of studying, I urge you to work with other students. Even if your schedule is such that you cannot meet with somebody else, identify somebody whom you can call. Often hours of frustration can be avoided by talking to somebody else. Think about the problems by yourself first, and then ask about the ones you had trouble with.
In preparation for the class, you should read ahead in the material (find the places which they appear in the book and mark them by red pen). This way you can pay particular attention to anything that did not make sense to you. In addition, it is a good idea keep a note book in which you record definitions, theorems and formulas (this will be a good summary for review before exams and will come in handy in any course in which you may need this material). You may also want to summarize the sections/chapters to help in studying for exams. Another use of this notebook is to record which parts of the material/ homework questions you had trouble with. Often when writing down your difficulties they may clear up. I will not collect/check this notebook, but that should not keep you from making entries.
Course Grade:
Your final grade will be computed as follows:
Quiz 40% ( 4 quizzes during this class )
Midterm Exam 25%
Final Exam 35%
Attendance (I expect you to be on time and to stay until the end of class!) and class participation will be taken into account in borderline cases.
Important date:
Midterm Exam: 12-16 November, 2007
Final Exam: 14-18 January, 2008
Quiz:
1st: Oct. 18, 2007
2nd: Nov. 07, 2007
3rd: Dec. 13, 2007
4th: Jan. 09, 2008
Probability Theory. Chapter 0: Introduction 3/4
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
Homework: The minimum exercises essential for the course are listed in the following.
♦ Chapter 1:
Section 1.2 1,3,5,7 Section 1.4 1,3,7,9,19 Section 1.7 1,5,10
♦ Chapter 2: Section 2.2 1,3,5,8,9,11,17 Section 2.3 1,5,9 Section 2.4 1,3,5,11,13,15,21,37
♦ Chapter 3: Section 3.1 1,3,5,11 Section 3.2 1 Section 3.3 1,3,7 Section 3.4 1,3,7,9 Section 3.5 3,5,7,9,15,17,25,27,34,35
♦ Chapter 4: Section 4.2 1,5,11 Section 4.3 1,3,7,11 Section 4.4 3,5,7 Section 4.5 1,3,5,7 Section 4.6 1
♦ Chapter 5: Section 5.1 1,3,9,13,17,19 Section 5.2 1,3,5,7,9 Section 5.3 1,5,13,17
♦ Chapter 6: Section 6.1 1,3,5,7,9 Section 6.2 3,7 Section 6.3 2,3,4,5,7 Review 9,12
♦ Chapter 7: Section 7.1 1,5,11 Section 7.2 1,9,11,15,17,21 Section 7.3 3,5,6
♦ Chapter 8: Section 8.1 1,5,7,9,11,13,15,17,25 Section 8.2 1,3,5,9,13,15,17 Section 8.3 1,5,7,11,13
♦ Chapter 10: Section 10.1 3,5,7 Section 10.2 1,3,5,9,11 Section 10.3 1,3,5 Section 10.4 1,3,5,19
Probability Theory. Chapter 0: Introduction 4/4
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
Section 10.5 1,3,5
♦ Chapter 11: Section 11.1 1,3,4,11,13,15,17 Section 11.2 1,11,15 Section 11.3 1,3,5,7 Section 11.4 1 Section 11.5 1,3,5
Probability Theory. Chapter 1: Axioms of Probability 1/8
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
Chapter 1. Axioms of Probability
1.1. Introduction
Probability
Relative frequency interpretation
(No. of desired outcomes) / (No. of experiments)
Counting the number of different ways that a certain event can occur
( )limn
n An→∞
1.2. Sample Space and Events
Sample Space
Set of all possible outcomes of an (a random) experiment
Usually denoted by S
Outcomes: sample points of S
Event
Subset of S
Set of outcome(s)
Example 1.1 For the experiment of tossing a coin once, the sample space S consists of two pints (outcomes), “heads” (H) and “tails” (T). Thus S={H,T}.
Example 1.2 Suppose that an experiment consists of two steps. First a coin is flipped. If the outcome is tails, a die is tossed. If the outcome is heads, the coin is flipped again. The sample space of this experiment is S = {T1, T2, T3, T4, T5, T6, HT, HH}. For this experiment, the event of heads in the first flip of the coin is E = {HT, HH}, and the event of an odd outcome when the die is tossed is F = {T1, T3, T5}
Relations between events
Subset
Probability Theory. Chapter 1: Axioms of Probability 2/8
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
E is said to be a subset of F if whenever E occurs F also occurs. E F⊆
Equality
E and F are said to be equal if the occurrence of E implies the occurrence of F, and vice versa.
E F⊆ and F E⊆ , hence E F= .
Intersection
An event is called the intersection of two events E and F if it occurs only whenever E and F occur simultaneously.
EF or { }| and E F x x E x F∩ = ∈ ∈
Union
An event is called the union of two events E and F if it occurs whenever at least one of them occurs.
{ }| or E F x x E x F∪ = ∈ ∈
Complement
An event is called the complement of the event E if it only occurs whenever E does not occur.
{ }| cE S E x x E= − = ∉ ,
Difference { }| and cA B x x A x B A B− = ∈ ∉ = ∩
Certain Event
An event is called certain if its occurrence is inevitable
S
Impossible Event
An event is called impossible if there is certainty in its nonoccurrence. ∅ (the empty set), cS
Mutually Exclusive (disjoint)
If the joint occurrence of two events E and F is impossible, we say that E and F are mutually exclusive (disjoint).
EF =∅ A set of mutually exclusive events 1 2{ , }nE E E is called mutually exclusive if the
joint occurrence of any two of them is impossible, that is, 1 2 , for E E i j=∅ ∀ ≠ .
Example 1.7 At a busy international airport, arriving planes land on a first-come, first-served basis. Let E, F, and H be the events that there are, respectively, at least five, at
Probability Theory. Chapter 1: Axioms of Probability 3/8
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
most three, and exactly two planes waiting to land. Then
♦ Ec is the event that at most four planes are waiting to land.
♦ Fc is the event that at least four planes are waiting to land.
♦ E is a subset of Fc; that is, if E occurs, then Fc occurs. Therefore, EFc = E.
♦ H is a subset of F; that is, if H occurs, then F occurs. Therefore, FH = H.
♦ E and F are mutually exclusive; that is, EF = ∅ . E and H are also mutually exclusive since EH =∅ .
♦ FHc is the event that the number of planes waiting to land is zero, one or three.
Venn Diagram: Figure 1.1
The shaded regions are the indicated events.
More relations: ( )c cE E= , cE E S∪ = , cEE =∅
Commutative Laws: E F F E∪ = ∪ , EF FE= Associative Laws: ( ) ( )E F G E F G∪ ∪ = ∪ ∪ , ( ) ( )E FG EF G=
Distributive Laws: ( ) ( )( )EF G E G F G∪ = ∪ ∪ , ( ) ( )E F G EF EG∪ = ∪
Demogan’s First Laws: 1 1
( ) , ( )n n
c c c c ci i
i i
E G E G E E= =
∪ = =∪ ∩
Demogan’s Second Laws: 1 1
( ) , ( )n n
c c c c ci i
i i
EG E G E E= =
= ∪ =∩ ∪
( )c cE ES E F F EF EF= = ∪ = ∪ , EF and cEF are disjoint
Probability Theory. Chapter 1: Axioms of Probability 4/8
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
1.3. Axioms of Probability
Definition (Probability Axioms)
Let S be the sample space of a random phenomenon. Suppose that to each event A of S, a number denoted by P(A), is associated with A. If P satisfies the following axioms, then it is called a probability and the number P(A) is said to be the probability of A. Axiom 1: ( ) 0P A ≥
Axiom 2: P(S) = 1 Axiom 3: If 1 2 3{ , , , }A A A is a sequence of mutually exclusive events (i.e.
,i jA A i j=∅ ≠ ) then 11
P ( ) ( )i iii
A P A∞ ∞
==
=∑∪
Note that the axioms of probability are a set of rules that must be satisfied before S and P can be considered a probability model. Probability is a real-valued nonnegative, countably additive function.
“Equally likely” or “Equally probable”: we say that (A,B) are equally likely if P(A) = P(B). Sample points 1w and 2w are equally likely if ( ) ( )1 2{ } { }P w P w= .
Theorem 1.1 ( ) 0P ∅ = . The probability of the empty set is 0.
Theorem 1.2 Let 1 2 3{ , , , }nA A A A be a mutually exclusive set of events. Then
11
P ( ) ( )n n
i iii
A P A==
=∑∪ Note that if 1A and 2A are mutually exclusive, then 1 2 1 2P ( ) ( ) ( )A A P A P A∪ = + . Also note that 1=P(S)=P ( ) ( ) ( )c cA A P A P A∪ = + , and then ( ) 1P A ≤ . Hence the probability of the occurrence of an event is always some number between 0 and 1, 0 ( ) 1P A≤ ≤ .
Example 1.9 A coin is called unbiased or fair if, whenever it is flipped, the probability of obtaining tails. Suppose that in an experiment is an unbiased coin is flipped. The sample space of such an experiment is { }S= T, H . Since the events { }H and { }T are equally likely to occur, { }( ) { }( )P T =P H ,and since they are mutually exclusive,
{ }( ) { }( ) { }( )P T,H =P T +P H Hence Axioms 2 and 3 imply that
{ }( ) { }( ) { }( ) { }( ){ }( ) { }( ) { }( )
1=P S =P H,T =P H +P T
=P H +P H =2P H .
Probability Theory. Chapter 1: Axioms of Probability 5/8
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
This gives that { }( ) { }( )P H =1/2 and P T =1/2. Now suppose that an experiment consists of flipping a biased coin where the outcome of tails is twice as likely as heads; then
{ }( ) { }( )P T = 2P H .Hence { }( ) { }( ) { }( ) { }( ){ }( ) { }( ) { }( )
1=P S =P H,T =P H +P T
=P H +2P H =3P H .
This shows that { }( )P H =1/3 ; thus { }( )P T =2/3.
Example1.10 Sharon has baked five loaves of bread that are identical except that one of them is underweight. Sharon’s husband chooses one of these loaves at random. Let Bi ,1 i 5,≤ ≤ be the event that he choose the i-th loaf. Since all five loaves are equally likely
to be drawn, we have
{ }( ) { }( ) { }( ) { }( ) { }( )1 2 3 4 5P B =P B =P B =P B =P B . But the events { } { } { } { } { }B , B , B , B , B1 2 3 4 5 are mutually exclusive, and the sample space is { }1 2 3 4 5P B ,B ,B ,B ,B Therefore, by Axioms 2 and 3,
( ) { }( ) { }( ) { }( ) { }( ) { }( ) { }( )1 2 3 4 5 11=P S =P B +P B +P B +P B +P B =5 P B .i This gives { }( )1P B =1/5 and hence { }( )iP B =1/5 ,1 i 5.≤ ≤ Therefore, the probability that Sharon’s husband chooses the underweight loaf is 1/5.
Theorem 1.3
Let S be the sample space of an experiment. If S has N points that are all equally likely to occur, then for any event A of S,
( )( ) N AP AN
=
Where N(A) is the number of points of A
Example 1.11 Let S be the sample space of flipping a fair coin three times and A be the event of at least two heads; then
{ }S= HHH,HTH,HHT,HTT,THH,THT,TTH,TTT
and A={ }HHH,HTH,HHT,THH . So N=8 and N(A)=4. Therefore, the probability of at lease two heads in flipping a fair coin three times is N(A)/N=4/8=1/2
Example 1.12 An elevator with two passengers stops at the second, third, and fourth floors. If it is equally likely that a passengers gets off at any of the three floors. What is the probability that the passengers get off at different floors?
Solution: Let a and b denote the tow passengers and a2b4 mean that a gets off at the
Probability Theory. Chapter 1: Axioms of Probability 6/8
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
second floor and b gets off at the fourth floor, with similar representations for other cases. Let A be the event that the passengers gets off at different floors. Then
{ 2 2, 2 3, 2 4, 3 2, 3 3, 3 4, 4 2, 4 3, 4 4}S a b a b a b a b a b a b a b a b a b=
And { 2 3, 2 4, 3 2, 3 4, 4 2, 4 3}A a b a b a b a b a b a b= . So N=9 and N(A)=6. Therefore, the desired probability is N(A)/N=6/9=2/3.
Example 1.13 A number is selected at random form the set of natural numbers {1,2,3,4, ,1000}. What is the probability that the number is divisible by 3 ?
Solution: Here the sample space contains 1000 points so N=1000. Let A be the set of all numbers between 1 and 1000 which are divisible by 3. Then A={3m:1 m 333}≤ ≤ . So N(A)
=333. Therefore, the probability that a random natural number between 1 and 1000 is divisible by 3 is equal to 333/1000=0.333.
1.4. Basic Theorems
Theorem 1.4 For any event A, ( ) 1 ( )cP A P A= −
Theorem 1.5 If A B⊆ , then ( ) ( ) ( ) ( )cP B A P BA P B P A− = = −
Corollary If A B⊆ , then ( ) ( )P B P A≥
Theorem 1.6 ( ) ( ) ( ) ( )P B A P B P A P B A∪ = + − ∩
From Theorem 1.6, we have
1 2 3 1 2 3 1 2 1 3
2 3 1 2 3
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )P A A A P A P A P A P A A P A A
P A A P A A A∪ ∪ = + + − −
− +
1 2 3 4 1 2 3 4 1 2
1 3 1 4
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )P A A A A P A P A P A P A P A A
P A A P A A∪ ∪ ∪ = + + + −
− − −
Inclusion-Exclusion Principle 1
1 1 11
P ( ) ( ) ( )n n n n
i i i ji i j ii
A P A P A A−
= = = +=
= − + − +∑ ∑ ∑∪
Example 1.15 Suppose that in a community of 400 adults, 300 bike or swim or do both, 160 swim, and 120 swim and bike. What is the probability that an adult, selected at random
Probability Theory. Chapter 1: Axioms of Probability 7/8
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
from this community, bikes? Answer: 0.65
Example 1.16 A number is chosen at random from the set of numbers {1,2,3,…,1000}. What is the probability that it is divisible by 3 or 5 (i.e. either 3 or 5 or both)? Answer: .0467
Theorem 1.7 ( ) ( ) ( )cP A P AB P AB= +
Example 1.19 In a community, 32% of the population are male smokers; 27% are female smokers. What percentage of the population of this community smoke?
Answer: 59%
Odds: in the real world, especially for games of chance, it is common to express probability in terms of odds.
The odds in favor of an event A are r to s if ( )( )c
P A rP A s
= ( or ( ) rP Ar s
=+
)
The odds against an event A are r to s if ( )( )
cP A rP A s
= ( or ( ) sP Ar s
=+
)
Thus if the odds in favor of an event A are r to s, then the odds against an event A are s to r.
Example 1.11: the odds in favor of HHH are 1/8 to 7/8 or, equivalently, 1 to 7.
1.5. Continuity of Probability Function
Continuity for a real-valued function lim ( ) ( ),
x cf x f c c R
→= ∀ ∈
equivalent to the sequential criterion lim ( ) (lim ),n nn nf x f x→∞ →∞= for every convergent sequence 1{ }n nx
∞= in R.
increasing sequence: A sequence { , 1}nE n ≥ of events of a sample space is called
increasing if 1 2 3 1n nE E E E E +⊆ ⊆ ⊆ ⊆ ⊆ ⊆ . Therefore, 1
lim n nnn
E E∞
→∞=
=∪ . decreasing sequence: A sequence { , 1}nE n ≥ of events of a sample space is called
decreasing if 1 2 3 1n nE E E E E +⊇ ⊇ ⊇ ⊇ ⊇ ⊇ . Thus, 1
lim n nnn
E E∞
→∞=
=∩
Probability Theory. Chapter 1: Axioms of Probability 8/8
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
Theorem 1.8 (Continuity of Probability Function) For any increasing or decreasing sequence of events, { , 1}nE n ≥
lim ( ) (lim )n nn nP E P E→∞ →∞=
1.6. Probabilities 0 and 1
There exist infinitely many events each with probability 1, and infinitely many events each with probability 0.
( ) 1P E = doesn’t mean E=S, and ( ) 0P E = doesn’t mean E = ∅ .
In random selection of points from a continuous period, say (0,1), the probability of the occurrence of any particular point is 0 (shown in page 30 via an example). let
{ }, (0,1)E t t= ∈ and (0,1) { }tB t= − . Then ( ) 0P E = and ( ) 1tP B = . Therefore, there
are infinitely many events with probability 1 and 0, respectively.
1.7. Random Selection of Points from Intervals
Probability that random selection of points from intervals (a,b) is 0 (Section 1.6)
The probability that a random point selected from (a,b) falls into the interval ( , )2
a ba +
is 1/2. The probability that it falls into ( , )2
a b b+ is also 1/2.
The probability of the event that a random point from (a,b) falls into a subinterval
( , )α β is equal to b aβ α−−
. (can be induced from Theorem 1.3)
Definition (uniform distribution)
A point is said to be randomly selected from an interval (a,b) if any two subintervals of (a,b) that have the same length are equally likely to include the point. The probability associated with the event that the subinterval ( , )α β contains the point is defined to
be b aβ α−−
.
Probability Theory. Chapter 2: Combinatorial Methods 1/6
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
Chapter 2. Combinatorial Methods
2.1. Introduction
Combinatorial Analysis
Dealing with method of counting
Enabling us to count systematically
Some problems in probability can be solved just by counting the total number of sample points and the number of ways that an event can occur when all sample points are equally likely. (c.f. Theorem 1.3)
2.2. Counting Principle
Theorem 2.1 (Counting Principle) If the set E contains n elements and the set F contains m elements, there are nm ways in which we can choose first an element of E and then an element of F.
Theorem 2.2 (Generalized Counting Principle) Let 1 2, , , kE E E be the sets with 1 2, , , kn n n elements, respectively. Then there are
1 2 kn n n× × × ways in which we can first choose an element of 1E , then an element of 2E , then an element of 3, ,E and finally an element of kE
Example 2.1 How many outcomes are there if we throw five dice? Answer: 56
Example 2.2 In tossing four fair dice, what is the probability of at least one 3? Answer: 671/1296~=0.52
Example 2.4 Rose has invited n friends to her birthday party. If they all attend, and each one shakes hands with everyone else at the party exactly once, what is the number of handshakes? Answer: n(n+1)/2
Example 2.6 (Standard Birthday Problem) What is the probability that at least two students of a class of size n have the same birthday? Compute the numerical values of such
Probability Theory. Chapter 2: Combinatorial Methods 2/6
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
probabilities for n = 23,30,50, and 60. Assume that the birth rates are constant throughout the year and that each year has 365 days. Answer: 0.507, 0.706, 0.970, 0.995.
Power Set The set of all subsets of a set A is called the power set of A.
Theorem2.3 (Number of Subsets of a Set)
A set with n element has 2n subsets. (The proof is based on the generalized counting principle.)
Example 2.7 A restaurant advertises that it offers over 1000 varieties of pizza. If, at the restaurant, it is possible to have on a pizza any combination of pepperoni, mushrooms, sausage, green peppers, onions, anchovies, salami, bacon, olives, and ground beef, is the restaurant’s advertisement true? Answer: True.
Tree Diagrams
Useful pictorial representations breaking down a complex counting problem into smaller, more tractable ones.
The number of possible ways that an experiment can be performed is finite. Systematically identifying all possible cases.
Example 2.8 Bill and John Keep playing chess until one of them wins two games in a row or three games altogether. In what percent of all possible cases does the game end because Bill wins three games without winning two in a row? Answer: 10%
Example 2.9 Mark has $4. He decides to bet $1 on the flip of a fair coin four times. What is the probability that (a) he breaks even; (b) he wins money? Answer: 6/16, 5/16.
Probability Theory. Chapter 2: Combinatorial Methods 3/6
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
2.3. Permutations
Definition An ordered arrangement of r objects from a set A containing n objects (0 )r n< ≤ is called an r -element permutation of A , or a permutation of the elements of A taken r at a time. The number of r -element permutations of a set A containing n elements is denoted by n rP .
Notes: !( 1)( 2) ( 1)( )!n r
nP n n n n rn r
= − − − + =−
and
( 1)( 2) ( 1) !n nP n n n n n n= − − − + =
Example 2.10 Three people, Brown, Smith, and Jones, must be scheduled for job interviews. In how many different orders can this be done? Answer: 6.
Example 2.11 Suppose that two anthropology, four computer science, three statistics, three biology, and five music books are put on a bookshelf with a random arrangement. What is the probability that the books of the same subject are together? Answer: 8( ) 5! 2! 4! 3! 3! 5!/17! 7 10P A −= × × × × × ≈ ×
Example 2.12 If five boys and five girls sit in a row in a random order, what is the probability that no two children of the same sex sit together? Answer: ( ) 2 5! 5!/10! 0.008P A = × × ≈
Theorem 2.4
The number of distinguishable permutation of n objects of k different types, where
Probability Theory. Chapter 2: Combinatorial Methods 4/6
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
1n are alike, 2n are alike , , kn are alike and 1 2 kn n n n= + + + is 1 2
!! ! !k
nn n n× × ×
.
Example 2.13 How many different 10-letter codes can be made using three a’s, four b’s, and three c’s? Answer: 10!/(3! 4! 3!) 4200× × =
Example 2.15 A fair coin is flipped 10 times. What is the probability of obtaining exactly three heads? Answer: 10( ) 10!/(3! 7!) / 2 0.12P A = × =
2.4. Combinations
Definition An unordered arrangement of r objects from a set A containing n objects (0 )r n< ≤ is called an r -element combination of A or a combination of the elements of A taken r at a time.
Notes: !
( )! !n rn nCr n r r
⎛ ⎞= =⎜ ⎟ − ×⎝ ⎠
(!
n rPr
= )
1, 0 1 1n n n n
nn n
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞= = = =⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
For any 1
0 , , 1
n n n n nr n
r n r r r r+⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
≤ ≤ = = +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟− −⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
Example 2.16 In how many ways can two mathematics and three biology books be selected from eight mathematics and six biology books? Answer: 560
Example 2.18 In a small town, 11 of the 25 schoolteachers are against abortion, eight are for abortion, and the rest are indifferent. A random sample of five schoolteachers is selected for an interview. What is the probability that (a) all of them are for abortion; (b) all of them have the same opinion? Answer: 0.0011, 0.0099
Example 2.20 From an ordinary deck of 52 cards, seven cards are drawn at random and without replacement. What is the probability that at least one of the cards is a king? Answer: 0.4496
Probability Theory. Chapter 2: Combinatorial Methods 5/6
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
Example 2.21 What is the probability that a poker hand is a full house? A poker hand consists of five randomly selected cards from an ordinary deck of 52 cards. It is a full house if three cards are of one denomination and two cards are of another denomination: for example, three queens and two 4’s.
Answer:
13 12 4 41 1 3 2
0.0014525
⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠ ≈
⎛ ⎞⎜ ⎟⎝ ⎠
Example 2.22 Show that the number of different ways n indistinguishable objects can be
placed into k distinguishable cell is 1 1
1n k n k
n k+ − + −⎛ ⎞ ⎛ ⎞
=⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠
Example 2.23 Let n be a positive integer, and let 1 2 kx x x n+ + + = be a given equation. A vector 1 2( , , , )kx x x satisfying 1 2 kx x x n+ + + = is said to be a nonnegative integer solution of the equation if for each ,i 1 i k≤ ≤ , ix is a nonnegative integer. It is said to be a positive integer solution of the equation if for each ,i 1 i k≤ ≤ , ix is a positive
integer. (a) How many distinct nonnegative integer solutions does the equation 1 2 kx x x n+ + + = have? (b) How many distinct positive integer solutions does the
equation 1 2 kx x x n+ + + = have?
Theorem 2.5 (Binomial Expansion) For any integer 0,n ≥
( )
0
1 2 2 1 0 1 2 1
nn n i i
i
n n n n n
nx y x y
i
n n n n nx x y x y xy y
n n
−
=
− − −
⎛ ⎞+ = ⎜ ⎟
⎝ ⎠⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
= + + + + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
∑
Example 2.25 What is the coefficient of 2 3x y in the expansion of 5(2 3 )x y+ ? Answer: 1080
Example 2.26 Evaluate the sum 0 1 2 3n n n n n
n⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
+ + + + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
.
Answer: 2n .
Example 2.27 Evaluate the sum 2 31 2 3n n n n
nn
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ + + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠.
Probability Theory. Chapter 2: Combinatorial Methods 6/6
Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007
Answer: 12nn −
Solution:
( ) ( )n n-1n! n (n-1)!i =i = =n .i i-1 i! n-i ! (i-1)! n-i !
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
ii
So from Example 2.26, we have
n-1
n n n n+2 +3 + +n
1 2 3 n
n-1 n-1 n-1 n-1 =n + + + +
0 1 2 n-1
=n 2 .
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦
i
Example 2.28 Prove that 2
0
2 n
i
n nn i=
⎛ ⎞ ⎛ ⎞=⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠∑ .
Theorem 2.6 (Multinomial Expansion)
1 2
1 2
1 2 1 21 2
!( )! ! !
k
k
nn nnk k
x x x n k
nx x x x x xn n n+ + + =
+ + + = ∑
Note that the sum is taken over all nonnegative integers 1 2, , , kn n n such that
1 2 kn n n n+ + + = .
2.5. Stirling’s Formula
Theorem 2.7 (Stirling’s Formula) ! 2 ,n nn nn eπ −∼
where the sign ~ means !lim 1
2 n nnnnn eπ −→∞
=
Probability Theory. Chapter 3: Conditional Probability and Independence 1/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Chapter 3. Conditional Probability and Independence
3.1. Conditional Probability
The conditional probability of A given B
Knowing that event B has occurred changes the chances of the occurrence of event A
Denoted by ( )|P A B :Probability of A given B
Definition:
If ( ) 0P B > , ( ) ( )( )|
P ABP A B
P B= . (AB : joint occurrence of two events)
Example 3.1 In a certain region of Russia, the probability that a person lives at least 80 years is 0.75, and the probability that he or she lives at least 90 years is 0.63. What is the probability that a randomly selected 80-year-old person from this region will survive to become 90? Answer: 0.84
Example 3.2 From the set of all families with two children, a family is selected at random and is found to have a girl. What is the probability that the other child of the family is a girl? Assume that in a two-child family all sex distributions are equally probable. Answer: 1/3
Example 3.3 From the set of all families with two children, a child is selected at random and is found to be a girl. What is the probability that the second child of this girl’s family is also a girl? Assume that in a two-child family all sex distributions are equally probable. Answer: 1/2
Example 3.5 We draw eight cards at random from an ordinary deck of 52 cards. Given that three of them are spades, what is the probability that the remaining five are also spades? Answer: 65.44 10−×
( )8
-6
3
13 3913 8 -( )( | ) 5.44 10 .
5288
X
X XP ABP A BP B =
⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎛ ⎞ ⎝ ⎠⎝ ⎠= = ≈ ×⎜ ⎟ ⎛ ⎞⎝ ⎠
⎜ ⎟⎝ ⎠
∑
Probability Theory. Chapter 3: Conditional Probability and Independence 2/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Theorem 3.1
Let S be the sample space of an experiment, and let B be an event of S with P(B)>0. Then the conditional probability satisfies the probability axioms, i.e., (1) ( | ) 0P A B ≥ for any event A of S
(2) ( | ) 1P S B =
(3) If 1 2, , , ,nA A A… is a sequence of mutually exclusive events, then
( )11
| |i iii
P A B P A B∞ ∞
==
⎛ ⎞=⎜ ⎟
⎝ ⎠∑∪
Reduction of Sample Space Let B be an event of a sample space S with P(B)>0. For A B⊆ , define ( ) ( | )Q A P A B= .
: ( ) [0,1], ( ) 0, ( ) ( | ) 1Q subset B Q A Q B P B B→ ≥ = = , and
If 1 2, , , ,nA A A… is a sequence of mutually exclusive subset of B, then
( ) ( )1 11 1
| |i i i ii ii i
Q A P A B P A B Q A∞ ∞ ∞ ∞
= == =
⎛ ⎞ ⎛ ⎞= = =⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠∑ ∑∪ ∪
Thus Q satisfies the probability axioms and is a probability function.
For Q, the sample space is reduced from S to B. ( cf. For P, the sample space is S) Suppose that we are interested in P(E|B) where E S⊆ . It is usually easier to compute Q(EB) rather than P(E|B)
Example 3.6 A child mixes 10 good and three dead batteries. To find the dead batteries, his father tests them one-by-one and without replacement. If the first four batteries tested are all good, what is the probability that the fifth one is dead? Answer: 3/9
Example 3.7 A farmer decides to test four fertilizers for his soybean fields. He buys 32 bags of fertilizers, eight bags from each kind, and tries them randomly on 32 plots, eight plots from each of field A, B, C, and D, one bag per plot. If from type I fertilizer one bag is tried on field A and three on field B, what is the probability that two bags of type I fertilizer are tried on field D? Answer: 0.43
For all choices of B, P(B)>0 ( | ) 0P B∅ =
( | ) 1 ( | )cP A B P A B= −
If C A⊆ , then ( | ) ( | ) ( | ) ( | )cP AC B P A C B P A B P C B= − = −
If C A⊆ , then ( | ) ( | )P C B P A B≤
Probability Theory. Chapter 3: Conditional Probability and Independence 3/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
( | ) ( | ) ( | ) ( | )P A C B P A B P C B P AC B∪ = + −
( | ) ( | ) ( | )cP A B P AC B P AC B= +
1
1 1 11
P ( | ) ( | ) ( | )n n n n
i i i ji i j ii
A B P A B P A A B−
= = = +=
= − + − +∑ ∑ ∑∪
For any increasing or decreasing sequence of events, { , 1}nA n ≥ , lim ( | ) (lim | )n nn nP A B P A B→∞ →∞=
3.2. Law of Multiplication
Law of Multiplication Calculating ( )P AB from ( )|P A B or ( )|P B A
( ) ( ) ( )|P AB P B P A B= , ( ) 0P B >
( ) ( ) ( )|P BA P A P B A= , ( ) 0P A >
( ) ( ) ( ) ( ) ( ) ( ) ( )| |P AB P BA P AB P A P B A P B P A B= ⇒ = =∵
Example 3.9 Suppose that five good fuses and two defective ones have been mixed up. To find the defective fuses, we test them one-by-one, at random and without replacement. What is the probability that we are lucky and find both of the defective fuses in the first two tests? Answer: 1/21
Solution: Let D1 and D2 be the events of finding a defective fuse in the first and second tests, respectively. We are interested in P(D1D2). Using (3.5), we get
1 2 1 2 12 1 1( ) ( ) ( | ) = = 7 6 21
P D D P D P D D= ×
Extended Law of Multiplication:
( ) ( ) ( ) ( )| |P ABC P A P B A P C AB=
since ( ) ( ) ( ) ( ) ( )( )( )( ) ( )
| |P BA P CAB
P A P B A P C AB P A P ABCP A P AB
= =
Theorem 3.2 Generalized Law of Multiplication If 1 2 1( ) 0nP A A A − >… , then
1 2 1 2 1 3 1 2 1 2 1( ) ( ) ( | ) ( | ) ( | )n n nP A A A P A P A A P A A A P A A A A −=… …
Probability Theory. Chapter 3: Conditional Probability and Independence 4/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Example 3.11 Suppose that five good and two defective fuses have been mixed up. To find the defective ones, we test them one-by-one, at random and without replacement. What is the probability that we find both of the defective fuses in exactly three tests? Answer: 0.095
Solution: Let D1, D2, and D3 be the events that the first, second, and third fuses tested are defective, respectively. Let G1, G2, and G3 be the events that the first, second, and third fuses tested are good, respectively. We are interested in the probability of the event
1 2 3 1 2 3G D D D G D∪ . We have
1 2 3 1 2 3 1 2 3 1 2 3
1 2 1 3 1 2 1 2 1 3 1 2
( ) ( ) ( ) ( ) ( | ) ( | ) ( ) ( | ) ( | )
5 2 1 2 5 1 = + 0.095.7 6 5 7 6 5
P G D D D G D P G D D P D G DP G P D G P D G D P G P D G P D G D
∪ = += +
× × × × ≈
3.3. Law of Total Probability
Theorem 3.3 (Law of Total Probability)
Let B be an event with ( ) 0P B > and ( ) 0cP B > . Then for any event A, ( ) ( ) ( ) ( ) ( )| | c cP A P A B P B P A B P B= +
Proof: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )| |c c c cP A P AB AB P AB P AB P A B P B P A B P B= ∪ = + = + .
Example 3.12 An insurance company rents 35% of the cars for its customers from agency I and 65% from agency II. If 8% of the cars of agency I and 5% of the cars of agency II break down during the rental periods, what is the probability that a car rented by this insurance company breaks down?
Answer: 0.0605
( ) ( ) ( ) ( ) ( )| I I | II II =0.0605P A P A P P A P= +
Probability Theory. Chapter 3: Conditional Probability and Independence 5/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Example 3.14 (Gambler’s Ruin Problem) Two gamblers play the game of “heads or tails,” in which each time a fair coin lands heads up, player A wins $1 from B, and each time it lands tails up, players B wins $1 from A. Suppose that player A initially has a dollars and player B has b dollars. If they continue to play this game successively, what is the probability that (a) A will be ruined; (b) the game goes forever with nobody winning? Answer: b/(a+b), either A is ruined or B is ruined.
Solution:
(a) Let E be the event that A will be ruined if he or she starts with i dollars, and let ( )ip P E= . Our aim is to calculate ap . To do so, we define F to be the event that A wins
the first game. Then ( ) ( | ) ( ) ( | ) ( ).C CP E P E F P F P E F P F= +
In this formula, P(E I F) is the probability that A will be ruined, given that he wins the first game; so P(E I F) is the probability that A will be ruined if his capital is i+1; that is,
1( | ) iP E F p += . Similarly, 1( | )c
iP E F p −= Hence
1 -11 1 .2 2i i i
p p p+= +i i (3.8)
Now 0 1p = because if A starts with 0 dollars, he or she is already ruined. Also, if the capital of A reaches a+b, then B is ruined; thus 0a bp + = . Therefore, we have to solve the system of recursive equations (3.8), subject to the boundary conditions 0 1p = and 0a bp + = . To do so,
note that (3.8) implies that
1 1 i i i ip p p p+ −− = −
Hence, letting 1 0- p p α= , we get
-1 -1 -2 -2 -3
2 1 1 0
i i i i i ip p p p p pp p p p α
− = − = −= = − = − =
Thus
Probability Theory. Chapter 3: Conditional Probability and Independence 6/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
1 0
2 1 0 0
3 2 0 0
0
2
2 3
i
p pp p p pp p p p
p p i
αα α α αα α α α
α
= += + = + + = += + = + + = +
= +
Now 0 1p = gives 1 . But 0 ; thus 0 1 ( )i a bp i p a bα α+= + = = + + . This gives 1/( )a bα = − + ;
therefore,
-1 .ii a b ip
a b a b+
= − =+ +
In particular, /( )ap b a b= + . That is, the probability that A will be ruined is b/(a+b).
(b) The same method can be used with obvious modification to calculate iq , the probability
that B is ruined if he or she starts with i dollars. The result is
- .ia b iqa b+
=+
Since B starts with b dollars, he or she will be ruined with probability /( )bq a a b= + . Thus the probability that the game goes on forever with nobody winning is 1- ( )b aq p+ . But 1- ( ) 1- /( ) - /( ) 0b aq q a a b b a b+ = + + = . Therefore, if this game is played successively,
eventually either A is ruined or B is ruined.
Remark3.2 If the game is not fair and on each play gambler A wins $1 from B with probability p, 0 1, 1/ 2p p< < ≠ , and loses $1 to B with probability q = 1-p, relation(3.8) becomes
-1 -1.i i ip pp qp= +
But ( ) ,i ip p q p= + so -1 -1( ) i i ip q p pp qp+ = + . This gives
-1 1( - ) ( - ),i i i iq p p p p p+=
which reduces to
1 -1- ( - ).i i i iqp p p pp+
=
Using this relation and following the same line of argument lead us to -1 -2
0.1i i
iq q qp pp p p
α⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥= + + + + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦
where 1 0p pα = − Using 0 1p = , we obtain
( )( )
/ -11
/ -1
i
iq p
pq p
α= +
After finding α from 0a bp + = and substituting in this equation, we get
Probability Theory. Chapter 3: Conditional Probability and Independence 7/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
( ) ( )( )
( )( )
-/ - / 1- /,
1- / 1- /
i a b a b i
i a b a b
q p q p p qp
q p p q
+ +
+ += =
where the last equality is obtained by multiplying the numerator and denominator of the function by ( )/ a bq p + . In particular,
( )( )
1- /,
1- /
b
a a b
p qp
p q +=
And, similarly,
( )( )
1- /.
1- /
b
b a b
q pq
q p +=
In this case also, 1a bp q+ = , meaning that eventually either A or B will be ruined and the
game does not go on forever, For comparison, suppose that A and B both start with $10. If they play a fair game, the probability that A will be ruined is 1/2, and the probability that B will be ruined is also 1/2. If they play an unfair game with p=3/4, q=1/4, then 10p , the
probability that A will be ruined, is almost 0.00002.
Definition (Partition) Let { }1 2, , , nB B B… be a set of nonempty subsets of the sample space S of an
experiment. If the events 1 2, , , nB B B… are mutually exclusive and 1n
iiB S
=∪ = , the set
{ }1 2, , , nB B B… is called a partition of S
Note : For any event ,A A and cA both nonempty, the set { }, cA A is a partition.
Probability Theory. Chapter 3: Conditional Probability and Independence 8/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Theorem 3.4 (Law of Total Probability) If { }1 2, , , nB B B… is a partition of the sample space of an experiment and ( ) 0iP B > for
1,2, ,i n= … , then for any event A of S we have
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( )
1 1 2 2
1
| | |
|
n n
n
i ii
P A P A B P B P A B P B P A B P B
P A B P B=
= + + +
=∑
More generally, let { }1 2, ,B B … be a sequence of mutually exclusive events of S such
that 1 iiB S
∞
=∪ = . Suppose that, for all 1i ≥ , ( ) 0iP B > . Then for any event A of S,
( ) ( ) ( )1
| i ii
P A P A B P B∞
=
=∑
Example 3.17 An urn contains 10 white and 12 red chips. Two chips are drawn at random and, without looking at their colors, are discarded. What is the probability that a third chip drawn is red? Answer: 12/22
Solution: For 1i ≥ , let iR be the event that the i-th chip drawn is red and iW be the event
that it is white. Intuitively, it should be clear that the two discarded chips provide no information, so 3( ) 12 / 22P R = , the same as if it were the first chip drawn from the urn. To prove this mathematically, note that { }2 1 2 1 2 1 2 1, , ,R W W R R R W W is a partition of the sample space; therefore,
( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )
3 3 2 1 2 1 3 2 1 2 1
3 2 1 2 1 3 2 1 2 1
| |
| | . 3.9
P R P R R W P R W P R W R P W R
P R R R P R R P R W W P W W
= +
+ +
Now
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
2 1 2 1 1
2 1 2 1 1
2 1 2 1 1
2 1 2 1 1
12 10 20 | ,21 22 7710 12 20 | ,21 22 77
11 12 22 | ,21 22 77
9 10 15 | .21 22 77
P R W P R W P W
P W R P W R P R
P R R P R R P R
P W W P W W P W
= = × =
= = × =
= = × =
= = × =
Substituting these values in (3.9), we get
( )311 20 11 20 10 22 12 15 12 .20 77 20 77 20 77 20 77 22
P R = × + × + × + × =
3.4. Bayes’ Formula
Probability Theory. Chapter 3: Conditional Probability and Independence 9/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Theorem3.5 (Bayes’ Theorem)
Let { }1 2, , , nB B B… be a partition of the sample space S of an experiment. If for 1,2, ,i n= … , ( ) 0iP B > , then for any event A of S with ( ) 0P A > ,
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 2 2|
|| | |
k kk
n n
P A B P BP B A
P A B P B P A B P B P A B P B=
+ + +
(Note that ( ) ( )( )| kk
P B AP B A
P A= .)
In statistical applications of Bayes’ theorem, 1 2, , , nB B B… are called hypotheses,
( )iP B is called the prior probability of iB , and the conditional probability ( )|iP B A is called the posterior probability of iB after the occurrence of A.
Simplest forms of Bayes’ formula:
For any event B of ,S B and cB both nonempty, the set { }, cB B is a partition of S . Thus, by Theorem 3.6, if ( ) 0P B > and ( ) 0cP B > , then for any event A of S with ( ) 0P A > ,
( ) ( ) ( )( ) ( ) ( ) ( )
||
| | c cP A B P B
P B AP A B P B P A B P B
=+
and
( ) ( ) ( )( ) ( ) ( ) ( )|
|| |
c cc
c c
P A B P BP B A
P A B P B P A B P B=
+
⇒ calculation of the posterior probability ( )|P B A by using the prior probability ( )|P A B .
Note1: The typical situation is that A happens logically or temporally after B , so that the ( )P B and ( )|P A B can be readily computed. Then we want to know the posterior probability ( )|P B A .
Note2: In practice, Bayes’ formula is used when we know that effect of a cause (hypothesis) and we wish to mark some inference about the cause (hypothesis).
Example 3.18 In a study conducted three years ago, 82% of the people in a randomly selected sample were found to have “good” financial credit ratings, while the remaining 18% were found to have “bad” financial credit ratings. Current records of the people from that sample show that 30% of those with bad credit ratings have since improved their ratings to good, while 15% of those with good credit ratings have since changed to having a
Probability Theory. Chapter 3: Conditional Probability and Independence 10/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
bad rating. What percentage of people with good credit ratings now had bad ratings three years ago?
Answer: 0.072
Example 3.21 A box contains seven red and 13 blue balls. Two balls are selected at random and are discarded without their colors being seen. If a third ball is drawn randomly and observed to be red, what is the probability that both of the discarded balls were blue? Answer: 0.46
Solution: Let BB, BR, and RR be the events that the discarded balls are blue and blue, blue and red, red and red, respectively. Also, let R be the event that the third ball drawn is red. Since{ } , , BB BR RR Is a partition of sample space, Bayes’ formula can be used to calculate ( ) | P BB R .
( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )
( ) ( )
( )
| | .
| | |
13 7 39 7 6 21, ,20 19 95 20 19 190
13 7 7 13 91 ,20 19 20 19 190
P R RB P BBP BB R
P R BB P BB P R BR P BR P R RR P RRNow
P BB P RR
and
P BR
=+ +
= × = = × =
= × + × =
where the last equation follows since B R is the union of two disjoint events: namely ,the first ball discarded was blue, the second was red, and vice versa.
Thus
( )7 39
18 95 | 0.46.7 39 6 91 5 2118 95 18 190 18 190
P BB R×
= ≈× + × + ×
This result can be found from the tree diagram of Figure 3.6, on page 105, as well. Alternatively, reducing sample space, given that the third ball was red, there are 13 blue and six red balls which could have been discarded. Thus
( ) 13 12P BB | R = 0.46.19 18
× ≈
Probability Theory. Chapter 3: Conditional Probability and Independence 11/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
3.5. Independence
Independence (statistically independent) If A is independent of B , knowledge regarding the occurrence of event B does not change the chance of the occurrence of event A . That is,
( ) ( )|P A B P A=
( ) ( ) ( )P AB P B P A∴ = ⇒ ( ) ( ) ( )P AB P A P B=
Or
( ) ( )|P B A P B= ( ) ( ) ( )P BA P A P B⇒ = ⇒ ( ) ( ) ( )P BA P A P B=
If A is independent of B , then B is independent of A .
Definition
Two event A and B are called independent if
( ) ( ) ( )P AB P A P B=
If two events are not independent, they are called dependent. If A and B are independent, we say that { },A B is an independent set of events. Note that by this definition any event A with ( ) 0 1P A or= is independent of every event B
Example 3.22 In the experiment of tossing a fair coin twice, let A and B be the events of getting heads on the first and second tosses, respectively. Intuitively, it is clear that A and B
Probability Theory. Chapter 3: Conditional Probability and Independence 12/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
are independent. To prove this mathematically, note that P(A) = 1/2 and P(B) =1/2. But since the sample space of this experiment consists of the four equally probable events: HH, HT, TH, and TT, we have P(AB) = P(HH) =1/4. Hence P(AB) = P(A)P(B) is valid, implying the independence of A and B. It is interesting to know that Jean Le Rond d’ Alembert (1717-1783), a French mathematician, had argued that, since in the experiment of tossing a fair coin twice, the possible number of heads is 0, 1, and 2, the probability of no heads, and two heads, each is 1/3
Example3.23 In the experiment of drawing a card from an ordinary deck of 52 cards, let A and B be the events of getting a heart and an ace, respectively. Whether A and B are independent cannot be answered easily on the basis of intuition alone. However, using the defining formula, P(AB) =P(A)P(B), this can be answered at once since P(AB) =1/52, P(A) = 1/4, P(B) = 1/13 ,and 1/52 = 1/4 x 1/13. Hence A and B are independent events.
Example 3.24 An urn contains five red and seven blue balls. Suppose that two balls are selected at random and with replacement. Let A and B be the events that the first and the second balls are red, respectively. Then, using the counting principle we get P(AB) = (5x5)/(12x12). Now P(AB) =P(A)P(B) since P(A) = 5/12 and P(B) = 5/12. Thus A and B are independent. If we do the same experiment without replacement, then ( )P B| A =4/11 while
( ) ( ) ( ) ( ) ( )C C| A A + | A A4 5 5 7 5 = + = ,
11 12 11 12 12
P B P B P P B P=
× ×
which might be quite surprising to some. But it is true. If no information is giving on the outcome of the first draw, there is no reason for the probability of second ball being red to differ from5/12. Thus ( ) ( )| AP B P B≠ , implying that A and B are dependent.
Theorem 3.6 If A and B are independent, then A and cB are independent as well.
Proof: ( ) ( ) ( )cP A P AB P AB= + ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )1c cP AB P A P AB P A P A P B P A P B P A P B∴ = − = − = − =⎡ ⎤⎣ ⎦
Corollary If A and B are independent, then cA and cB are independent as well.
Proof: A and B are independent A⇒ and cB are independent cB⇒ and A are independent cB⇒ and cA are independent.
Remark
If A and B are independent, knowledge about the occurrence or nonoccurrence
Probability Theory. Chapter 3: Conditional Probability and Independence 13/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
of A does not change the chances of the occurrence or nonoccurrence of B . If A and B are mutually exclusive events and ( ) ( )0, 0P A P B> > , then they
are dependent. This is because if we are given that one has occurred, the chance of the occurrence of the other one is 0.
For example, let A be the event that the next president of the United States is a Democrat and B be the event that he or she is a Republican. Then A and B are mutually exclusive; hence they are dependent. If A occurs, that is, if the next president is a Democrat, the probability that B occurs, that is, he or she is a Republican is zero, and vice versa.
Example 3.27 Dennis arrives at his office every day at a random time between 8:00 A.M. and 9:00 A.M. Let A be the event that John arrives at his office tomorrow between 8:15 A.M. and 8:45 A.M. Let B be the event that John arrives between 8:30 A.M. and 9:00 A.M., and let C be the event that John arrives either between 8:15 A.M. and 8:30 A.M. or between 8:45 A.M. and 9:00 A.M. Then , , , AB AC BC and B C∪ are events that John arrives at his office between 8:30 and 8:45, 8:15 and 8:30, 8:45 and 9:00, and 8:15 and 9:00, respectively. Thus ( ) ( ) ( ) 1 2P A P B P C= = = and ( ) ( ) 1 4P AB P AC= = . So ( ) ( ) ( )P AB P A P B= and ( ) ( ) ( )P AC P A P C= ; that is, A is independent of B and it is
independent of C . However, since BC and A are mutually exclusive, they are dependent. Also, ( ) ( )| 2 3P A B C P A∪ = ≠ . Thus B C∪ and A are dependent as well.
Definition (Extended Concept of Independence)
The event A , B and C called independent if
( ) ( ) ( )P AB P A P B=
( ) ( ) ( )P AC P A P C=
( ) ( ) ( )P BC P B P C=
( ) ( ) ( ) ( )P ABC P A P B P C=
If A , B and C are independent, we say that { }, ,A B C is an independent set of events.
Example 3.29 Let an experiment consist of throwing a die twice. Let A be the event that in the second throw the die lands 1, 2, or 5; B the event that in the second throw it lands 4, 5 or 6; and C the event that the sum of the two outcomes is 9.
Then ( ) ( ) ( )= B 1/2, C =1/9P A P P= , and
( ) ( ) ( )1 1 ,6 4
P AB P A P B= ≠ =
Probability Theory. Chapter 3: Conditional Probability and Independence 14/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
( ) ( ) ( )1 1 ,36 18
P AC P A P C= ≠ =
( ) ( ) ( )1 1 ,6 4
P AB P A P B= ≠ =
( ) ( ) ( )1 1 ,36 18
P AC P A P C= ≠ =
while ( ) ( ) ( ) ( )136
P ABC P A P B P C= =
Thus the validity of ( ) ( ) ( ) ( )P ABC P A P B P C= is not sufficient for the independence of , , .A B and C
Definition (Generalized Concept of Independence)
The set of event { }1 2, , , nA A A… is called independent if for every subset { }1 2, , , ki i iA A A… , 2k ≥ , of { }1 2, , , nA A A… ,
( )1 2 1 2, , , ( ) ( ) ( )k ki i i i i iP A A A P A P A P A=… By definition, if for all combination 1 i j k n≤ < < < ≤ , then
( ) ( ) ( )( ) ( ) ( ) ( )
( ) ( ) ( ) ( )1 2 1 2
i j i j
i j k i j k
n n
P A A P A P A
P A A A P A P A P A
P A A A P A P A P A
=
=
=
The number of equations must be satisfied
( )1 1 2 12 3 0 1
n nn n n n n nn
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ + = + − − = − −⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
Another view: The number of subsets of { }1 2, , , nA A A… is 2n , including 0 and n subset with only one event ⇒ The number of equations for n independent events must be satisfied is 2 1n n− −
Example 3.32 Figure 3.8 shows an electric circuit in which each of the switches located at 1, 2, 3, and 4 is independently closed or open with probabilities p and 1 p− , respectively. If a signal is fed to the input, what is the probability that it is transmitted to the output? Answer: 2 2(2 )p p−
Solution: Let Ei be the event that the switch at location i is closed, 1 4i≤ ≤ . A signal fed to the input will be transmitted to the output if at least one of the events E1E2 and E3E4 occurs. Hence the desired probability is
( ) ( ) ( ) ( )( )
1 2 3 4 1 2 3 4 1 2 3 4
2 2 4 2 2
-
2 .
P E E E E P E E P E E P E E E E
P P P P P
∪ = +
= + − = −
Probability Theory. Chapter 3: Conditional Probability and Independence 15/15
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 1/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Chapter 4. Distribution Function and Discrete Random Variables
4.1. Random Variables
Quantities that do not have fixed values:
The values of such quantities depend on random actions, and they usually change from one experiment to another.
For example, the number of babies born in a certain hospital each day, the arrival time of a bus at a station, the sum of the outcomes of two dice when thrown, the amount of rainfall in Seattle during a given year, the number of earthquakes that occur in California per month, the weight of grains of wheat grown on a certain plot of ground, etc.
In probability, the quantities introduced in these diverse examples are called Random variables of which the numerical values are unknown.
The numerical (experimental) values depend on random elements occurring at the time of the experiment, and over which we have no control.
Example In rolling two fair dice, if X is the sum, then { }2,3, ,12X ∈ , we should have
12
2( ) 1
iP X i
== =∑
( ){ }( ) ( ){ }
( 2) 1,1 1 36,
( 3) 1,2 , 2,1 2 36
p X P
P X P
= = =
= = =
( ) ( ) ( ){ }( 4) 1,3 , 2, 2 , 3,1 3 36P X P= = = ,
Definition Let S be the sample space of an experiment. A real-valued function :X S →ℜ is
called a random variable of the experiment if, for each interval B ⊆ℜ , { }: ( )s X s B∈ is an event.
The set { }: ( )s X s B∈ is often abbreviated as { }X B∈ , or simply as X B∈ .
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 2/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Example 4.1 Suppose that three cards are drawn from an ordinary deck of 52 cards, one-by-one, at random and with replacement. Let X be the number of spades drawn; then X is random variable. If an outcome of spades is denoted by s, and other outcomes are represented by t, then X is a real-valued function defined on the sample space
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ){ }, , , , , , , , , , , , , , , , , , , , , , , ,S s s s t s s s t s s s t s t t t s t t t s t t t= by ( , , ) 3, ( , , ) 2, ( , , ) 2, ( , , ) 1X s s s X s t s X s s t X s t t= = = = , and so on. Now we must determine the values that X assumes and the probabilities that are associated with them. Clearly, X can take the values 0, 1, 2, and 3. The probabilities associated with these values are calculated as follows:
( ) ( ){ }( )( ) ( ) ( ) ( ){ }( )
( ) ( ) ( ) ( ){ }( )
( ) ( ){ }( )
3 3 3 270 ,4 4 4 64
1
1 1 3 3 1 3 3 3 1 27 ,4 4 4 4 4 4 4 4 4 64
2
1 1 3 1 3 1 3 1 1 9 ,4 4 4 4 4 4 4 4 4 64
11
, ,
, , , , , , , ,
, , , , , , , ,
,4
, .6
P X P
P X
P X
t t t
P s t t t s t t t s
P s s t s t s t s
X P sP
s
s s
=
⎛ ⎞ ⎛ ⎞ ⎛ ⎞= + + =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠
=
⎛ ⎞ ⎛ ⎞ ⎛ ⎞= + + =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠
= = = × × =
= =
× × × × × ×
= =
× × × × × ×
= = = =
If the cards are drawn without replacement, the probabilities associated with the values 0, 1, 2, and 3 are:
( ) ( ) ( ) ( )
( )
39 13 39 13 39 133 1 2 2 1 3
0 , 1 , 2 , 3 .52 52 52 523 3 3 3
Therefore,13 39
3, 0, 1, 2, 3.
523
P X P X P X P X
i iP X i i
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠= = = = = = = =⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠= = =
⎛ ⎞⎜ ⎟⎝ ⎠
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 3/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Example 4.2 A bus stops at a station every day at some random time between 11:00 A.M. and 11:30 A.M. If X is the actual arrival time of the bus, X is a random variable. It is a
function defined on the sample space ( )1:11 11 by .2
S t t X t t⎧ ⎫= < < =⎨ ⎬⎩ ⎭
As we know from
Section1.7, P(X=t)=0 for any t S∈ , and ( ){ } ( ), 2111 112
P X β αα β β α−∈ = = −−
for any
subinterval ( ) 1, of 11,112
α β ⎛ ⎞⎜ ⎟⎝ ⎠
.
Example 4.3 In the U.S., the number of twin births is approximately 1 in 90. Let X be the number of births in a certain hospital until the first twins are born. X is a random variable. Denote twin births by T and single births by N. Then X is a real-valued function defined on the sample space { } ( )
1
, , , , by .i
S T NT NNT NNNT X NNN NT i−
= =… … The set of possible
values of X is {1,2,3, …} and
( )1
1
89 1 .90 90
i
i
P X i P NNN N T−
−
⎛ ⎞ ⎛ ⎞ ⎛ ⎞= = =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
…
Example 4.5 The diameter of a flat metal disk manufactured by a factory is a random between 4 and 4.5. What is the probability that the area of such a flat disk chosen at random is at least 4.41π ? Answer: 3/5
Solution: Let D be the diameter of the metal disk selected at random. D is a random variable, and the area of the metal disk, ( )2/ 2Dπ , which is a function of D , is also a random variable. We are interested in the probability of the event 2 / 4 4.41Dπ π> ,
which is
( ) ( )2
24.41 17.64 4.2 .4DP P D P Dπ π
⎛ ⎞> = > = >⎜ ⎟
⎝ ⎠
Now to calculate ( )4.2 ,P D > note that, since the length of D is a random variable in the interval (4, 4.5) , the probability that it falls into the subinterval (4.2, 4.5) is (4.5-4.2)/(4.5-4)=3/5. Hence ( )2 / 4 4.41 3 / 5.P Dπ π> =
Example 4.6 A random number is selected from the interval ( )0, 2π .What is the probability that its sine is greater than its cosine? Answer: 1/2
Solution: Let the number selected be X ; then X is a random variable and therefore sin X and cos X , which are functions of X , are also random variables. We are interested in the probability of the event sin cos :X X>
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 4/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
( ) ( ) 12 4tan 1 ,4 2
sin 0
2
cos P P X P XX X
π ππ
π
−⎛ ⎞= > = > = =⎜⎝ ⎠ −
> ⎟
where the first equality holds since in the interval ( )0, / 2 ,cos 0,Xπ > and the second equality holds since, in this interval, tan X is strictly increasing.
4.2. Distribution Functions
Usually, when dealing with a random variable X for constant a and b , computation of the probabilities ( )P X a= , ( )P X a< , ( )P a X b< ≤ etc. is our ultimate goal.
Since the real-valued function ( )P X t≤ characterizes X , it tells us almost everything about X .
Definition
If X is a random variable, then the function F defined on ( ),−∞ ∞ by ( ) ( )F t P X t= ≤ is called the distribution function of X .
F accumulates all of the probability of the values of X up to and including t . Sometimes it is called the cumulative distribution function ( cdf ) of X .
Properties
F is non-decreasing : if t u< , then ( ) ( )F t F u≤ . proof :
The occurrence of event { }X t≤ implies the occurrence of event { }X u≤
{ } { } { } { } ( ) ( )X t X u P X t P X u F t F u⇒ ≤ ⊆ ≤ ⇒ ≤ ≤ ≤ ⇒ ≤ lim ( ) 1
tF t
→∞=
proof :
This is to prove for any increasing sequence { }nt of real numbers that converges to ∞ , lim ( ) 1nt F t→∞ =
The events { }nX t≤ form an increasing sequence that converges to the event { } { }1 = X
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 5/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
lim ( ) 0t
F t→ −∞
= F is right continuous: for every t∀ ∈ℜ , ( ) ( )F t F t+ = . This means that if nt is a
decreasing sequence of real numbers converging to t , Then lim ( ) ( )nn F t F t→∞ = . proof :
The events { }nX t≤ form a decreasing sequence that converges to the event
{ }1
.nn
X t∞
=
≤∩
{ } { }1
nn
X t X t∞
=
≤ = ≤∩
{ } { } { } { }lim limn n n nX t X t P X t P X t→∞ →∞⇒ ≤ = ≤ ⇒ ≤ = ≤ lim ( ) ( )n nF t F t→∞⇒ =
Some Probability Questions:
( )P X a> ( ) ( )1 1 ( )P X a P X a F a> = − ≤ = −
( ) ,P a X b b a< ≤ > ( ) ( ) ( ) ( ) ( )P a X b P X b P X a F b F a⇒ < ≤ = ≤ − ≤ = −
( ) ( )P X a F a< = − { 1 }X a n≤ − is an increasing sequence that converge to { }X a<
{ } { }1 1x X a n X a∞
=≤ − =
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 6/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Example 4.7 The distribution function of a random variable X is given by 0 0
0 141 1 2( ) 2
1 1 2 312 2
1 3
xx x
xF x
x x
x
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 7/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Example 4.8 For the experiment of flipping a fair coin twice, let X be the number of tails and calculate ( )F t , the distribution function of X . And then sketch its graph.
Solution: Since X assumes only the values 0,1 and 2, we have ( ) ( ) 0, if 0F t P X t t= ≤ = < .
If 0 1t≤ < , then ( ) ( ) ( ) { }( ) 10 .4
F t P X t P X P HH= ≤ = = = =
( ) ( ) ( ) { }( )( )
( )
If 1 2, then30 or 1 , , ,4
and if 2, then 1. Hence
0 01/ 4 0 1
.3 / 4 1 2
1 2
t
F t P X t P X X P HH HT TH
t P X t
tt
F tt
t
≤ <
= ≤ = = = = =
≥ ≤ =
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 8/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Example 4.9 Suppose that a bus arrives at a station every day between 10:00 A.M and 10:30 A:M ,at random. Let X be the arrival time; find the distribution function of X and sketch its graph. Answer:
0 101( ) 2( 10) 10 10 .2
11 102
t
F t t t
t
⎧⎪ <⎪⎪= − ≤
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 9/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
If 1, t x< then ( ) 0.F t =
If 1 2x t x≤ < , then 1 1( ) ( ) ( ) ( )F t P X t P X x p x= ≤ = = = .
If 2 3x t x≤ < , then 1 2 1 2( ) ( ) ( or ) ( ) ( ).F t P X t P X x X x p x p x= ≤ = = = = +
In general, if 1n nx t x− ≤ < , then 1
1( ) ( )
n
ii
F t p x−
=
= ∑ .
F is constant in 1n nx t x− ≤ < , with jumps at { }1 2 3, , ,x x x . The magnitude of the jump at ix is ( )ip x .
Example 4.11 In the experiment of rolling a balanced die twice, let X be the maximum of the two numbers obtained. Determine and sketch the probability function and the distribution of X .
Solution: the possible values of X are 1, 2, 3, 4, 5, and 6. The sample space of this experiment consists of 36 equally likely outcomes. Hence the probability of any of them is 1/36. Thus
( ){ }( )
( )( )( ){ }( )
( )( )( )( )( ){ }( )
3(1) ( 1) 1,1 ,36
3(2) ( 2) 1, 2 2, 2 2,1 ,36
5(3) ( 3) 1, 3 2, 3 3, 3 3, 2 3,1 .36
P P X P
P P X P
P P X P
= = = =
= = = =
= = = =
Similarly, ( ) ( )4 7 / 36, 5 9 / 36,p p= = and ( ) ( ) { }6 11/ 36; 0, for 1, 2, 3, 4, 5, 6 .p p x x= = ∉ The graphical representation of p is as follows:
( )
0 11/ 36 1 24 / 36 2 39 / 36 3 4 .
16 / 36 4 525 / 36 5 61 6
xxx
F x xxx
x
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 10/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
4.4. Expectations of Discrete Random Variables
Expected Value If X is the random variable denoting the gain in one play, then the parameter
( )E X is the expected (average) value of X . If we play the game n times and find the average of the values of X , then as
n →∞ , ( )E X is obtained.
Fair Game If for some game ( ) 0E X = , then in long run the player neither loses nor wins. Such games are called fair.
Definition
The expected value of a discrete random variable X with the set of possible values A and probability function ( )p X is defined by ( ) ( )
x A
E X xp x∈
=∑ .
( )E X exists if this sum converges absolutely.
Note: ( )E X is also called the mean or the mathematical expectation, or the
expectation of X . It is also denoted by [ ]E X , EX , Xμ ,or μ .
Other Views If each value x of X is weighted by ( ) ( )p x P X x= = , then ( )
x Axp x
∈∑ is nothing but
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 11/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
the weighted average of X .
If we think of a unit mass distributed along the real line at the points of A so that the mass at x A∈ is ( )P X x= , then ( )E X is the center of gravity.
Example 4.14 We flip a fair coin twice and let X be the number of heads obtained. What is the expected value of X ? Answer: ( )E X =1
Example 4.15 We write the number 1 2, , na a a on n identical balls and mix them in a box, What is the expected value of a ball selected at random? Answer: ( ) 1 2( ) /nE X a a a n= + + +
Example 4.20 The tanks of a country’s army are numbered 1 to N. In a war this country loses n random tanks to the enemy, who discovers that the captured tanks are numbered. If
1 2, , , nX X X are the numbers of the captured tanks, what is (max )iE X ? How can the enemy use (max )iE X to find an estimate of N , the total number of this country’s tanks?
Solution: Let max iY X= ; then
( )
11
, 1, 2, , ,
kn
P Y k for k n n n NNn
−⎛ ⎞⎜ ⎟−⎝ ⎠= = = + +⎛ ⎞⎜ ⎟⎝ ⎠
…
Because if the maximum of 's isiX k , the numbers of the remaining 1n − tanks are from 1 to 1k − . Now
( ) ( )
( )( ) ( ) ( )
11
1 !1 !1 ! ! ! !
1 .
N N
k n k n
N N
k n k n
N
k n
kk
nE Y kP Y k
Nn
k k n kN Nn k n n k nn n
kN nn
== ==
== ==
==
−⎛ ⎞⎜ ⎟−⎝ ⎠= = =⎛ ⎞⎜ ⎟⎝ ⎠
−= =
− − −⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
⎛ ⎞= ⎜ ⎟⎛ ⎞ ⎝ ⎠⎜ ⎟⎝ ⎠
∑ ∑
∑ ∑
∑
(4.2)
To calculate ,N
k n
kn==⎛ ⎞⎜ ⎟⎝ ⎠
∑ note that kn⎛ ⎞⎜ ⎟⎝ ⎠
is the coefficient of nX in the polynomial ( )1 .kx+
Therefore, N
k n
kn==⎛ ⎞⎜ ⎟⎝ ⎠
∑ is the coefficient of nX in the polynomial ( )1 .N
k
k nx
==
+∑ Since
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 12/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
( ) ( ) ( ) ( ) ( )( )
( ) ( )
1
1
1 11 1 1 1
1 11 1 1 ,
N nN Nk n k n
k n k n
N n
xx x x x
x
X XX
− +
== ==
+
+ −+ = + + = +
+ −
⎡ ⎤= + − +⎣ ⎦
∑ ∑
and the coefficient of nx in the polynomial ( ) ( )11 1 1N nx xx
+⎡ ⎤+ − +⎣ ⎦ is 1
,1
Nn+⎛ ⎞
⎜ ⎟+⎝ ⎠ we have
1.
1
N
k n
k Nn n==
+⎛ ⎞ ⎛ ⎞=⎜ ⎟ ⎜ ⎟+⎝ ⎠ ⎝ ⎠
∑
Substituting this result in (4.2), we obtain
( )
( )( ) ( )
( )
( )1 !1
1 ! ! 11.! 1
! !
NNnn
n N n n NnE Y NN n
n N nn
++⎛ ⎞⎜ ⎟ + − ++⎝ ⎠= = =
+⎛ ⎞⎜ ⎟ −⎝ ⎠
To estimate N, the total number of this country’s tanks, we solve ( ) ( )11
n NE Y
n+
=+
for .N
We obtain ( )1 1.nN E Yn+
= − Therefore, if, for example, the enemy captures 12 tanks and
the maximum of the numbers of the tanks captured is 117, then assuming that ( )E Y is approximately equal to the value Y observed, we get ( )13/12 117 1 126 .N ≈ × − ≈
Theorem 4.1
If X is a constant random variable, that is, if ( ) 1P X c= = for a constant c , Then ( ) .E X c=
proof : ( ) 0 ( ) . ( ) .1P X c E x c P X c c c≠ = ⇒ = = = =
Theorem4.2 (Law of Unconscious Statistician)
Let X be a discrete random variable with set of possible values A and probability function ( )p x , and let g be a real-valued function. Then ( )g x is a random variable with ( ) ( ) ( )
x AE g x g x p x
∈
=⎡ ⎤⎣ ⎦ ∑
Note that ( )g x is random variable with the possible set of values ( )g A
Corollary
Let X be a discrete random variable; 1 2, , , nf f f be real-valued functions, and let
1 2, , , na a a be real numbers. Then
[ ] ( ) [ ] ( )1 1 2 2 1 1 2 2( ) ( ) ( ) ( )n n n nE f X f X f X E f X E f X E f Xα α α α α α+ + + = + + +⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 13/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Examples;
(1) ( ) ( ) ( ) ( )3 2 3 22 5 7 4 2 5 7 4E X X X E X E X E X+ + + = + + +
(2) ( 2sin log ) ( ) 2 (sin ) (log )X XE e X X E e E X E X+ + = + +
(3) ( )E X is linear: ( ) ( )E X E Xα β α β+ = +
Example 4.23 The probability function of discrete random variable X is given by
1, 2,3, 4,515( )0 .
x xp x
otherwise
⎧ =⎪= ⎨⎪⎩
, What is the expected value of (6 )X X− ? Answer: 7.
4.5. Variances and Moments of Discrete Random Variables
Additional measures for decision-marking are needed: Let Y be the true value (the average of a large number of measurement, ( )E X )
of the quantity minus the value obtained by measurement X . Then Y is the error of measurement.
Y is a random variable with expected value zero. The reason is that in measuring a quantity a very large number of times, positive and negative errors of the same magnitudes occur with equal probabilities.
( )( ) ( ) ( ) ( ) 0E Y E X E X E X E X E Xμ= − = − = − =⎡ ⎤⎣ ⎦ Consider an experiment in which a quantity is measured several times and the
average of the errors is obtained to be a number close to zero. Can we conclude that the measurements are very close to the true value and thus are accurate? The answer is no because they might differ from the true value by relatively large quantities but be scattered both in positive and negative directions.
Expectation by itself does not give adequate information.
Variance of Random Variable Variance measures the average magnitude of the fluctuations of a random variance
from its expectation. ( )[ ]E X E X− is not an appropriate measure for the variance, If we consider
( )( )E X E X− is difficult to handle; 2( ( ))E X E X⎡ ⎤−⎣ ⎦ analogous to Euclidean distance in geometry, is used instead and is called the variance of X .
The square root of 2( ( ))E X E X⎡ ⎤−⎣ ⎦ is called the standard deviation of X .
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 14/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Definition
Let X be a discrete random variable with set of possible values A , probability function ( )p x , and ( )E X u= , Then Xσ and ( )Var X , called the standard deviation
and the variance of X , respectively, are defined by 2( )X E Xσ μ⎡ ⎤= −⎣ ⎦ and 2 2( ) ( )x Var X E Xσ μ⎡ ⎤= = −⎣ ⎦ .
By this definition and Theorem 4.2: ( ) ( )22( ) ( )x A
Var X E X X p xμ μ∈
⎡ ⎤= − = −⎣ ⎦ ∑
Note:
Let X be a discrete random variable with the set of possible values A and probability function ( )p x . Suppose the value t is predicted for X , then the error
is X t− . We would like to minimize ( )2E X t⎡ ⎤−⎣ ⎦
( ) ( )
( ) ( )
2 2
2 2
( )
( ) 0 ( ) ( )
x A
x A x A x A
E X t x t p x
d dE X t x t p x xp x t p x tdt dt
∈
∈ ∈ ∈
⎡ ⎤− = −⎣ ⎦
⎡ ⎤− = − = ⇒ = =⎣ ⎦
∑
∑ ∑ ∑
( )2E X t⎡ ⎤⇒ −⎣ ⎦ is minimized for [ ]( ) ,x A
t xp x E X∈
= =∑ and the minima is
( )2( ) ( )E X E X Var X⎡ ⎤− =⎣ ⎦ . So ( )2( ) min
tVar X E X t⎡ ⎤= −⎣ ⎦
The better t predicts X ( optimal [ ]t E X= ), the smaller ( )Var X is.
Theorem 4.3: ( ) 22( ) ( )Var X E X E X= − ⎡ ⎤⎣ ⎦ .
:proof
( ) ( ) ( ) ( )( ) ( ) ( )
2 2 2 2 2 2
22 2 2
( ) 2 2Var X E X E X E X E X
E X E X E X
μ μ μ μ μ
μ
⎡ ⎤= − = − + = − +⎣ ⎦
= − = − ⎡ ⎤⎣ ⎦
Note1: Since ( ) 0Var X ≥ , for any discrete random variable X , ( ) ( )2 2E X E X⎡ ⎤≤⎡ ⎤⎣ ⎦ ⎣ ⎦ . Note2: The formula is usually a better alternative for computing the variance of X
Note3: In circuit theory, 2( )E X indicates total power, ( ) 2E X⎡ ⎤⎣ ⎦ indicates DC power, 2( ) ( )Var X E X μ⎡ ⎤= −⎣ ⎦ indicates AC power.
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 15/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
Example 4.27 What is the variance of the random variable X , the outcome of rolling a fair die? Answer: Var(X)=35/12
Theorem 4.4
Let X be a discrete random variable with the set of possible values A , and mean μ . Then ( ) 0Var X = if and only if X is a constant with probability 1.
:proof
)⇒
( ) ( )2 2( ) ( ) 0x A
Var X E X x p xμ μ∈
⎡ ⎤= − = − =⎣ ⎦ ∑
with that all the terms of this series are nonnegative. 0 x x Aμ⇒ − = ∀ ∈ Set A consists of only one number: μ
X μ⇒ = with probability 1
)⇐
Conversely, if X is a constant, say μ , then ( )X E X μ= = and ( ) ( )2 0Var X E X μ⎡ ⎤= − =⎣ ⎦ .
Theorem 4.5
Let X be a discrete random variable; then for constants and a b we have that ( ) 2 ( ), aX b XVar aX b a Var X aσ σ++ = = .
:proof
( ) ( ){ } ( ) ( )( ){ }( )( ){ } ( )( )( )( )
22
2 22
22 2
( )
( )
Var aX b E aX b E aX b E aX b aE X b
E a X E X E a X E X
a E X E X a Var X
⎡ ⎤⎡ ⎤+ = + − + = + − +⎢ ⎥⎣ ⎦ ⎣ ⎦⎡ ⎤ ⎡ ⎤= − = −⎢ ⎥ ⎣ ⎦⎣ ⎦⎡ ⎤= − =⎣ ⎦
Taking the square roots of both sides, we find that aX b Xaσ σ+ = .
Example 4.28 Suppose that, for a discrete random variable X , ( ) 2E X = and
( )4 5E X X − =⎡ ⎤⎣ ⎦ . Find the variance and the standard deviation of 4 12X− + .
Answer: ( )Var X =9 and ( )4 12Var X− + =144, 4 12 12Xσ− + = .
Concentration about an arbitrary point
Variance measures the dispersion, or spread, of a distribution about its expectation.
One way to find out which one of the two given random variables X and Y is
Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 16/17
Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007
less dispersed, or spread, about an arbitrary point, is to see which one is more concentrated about the point (or which one has smaller variance).
Definition Let X and Y be two random variables and ω be a given point. If for all 0,t >
( ) ( ),P Y t P X tω ω− ≤ ≤ − ≤ then we say that X is more concentrated about ω than
is Y .
Theorem 4.6 Suppose that X and Y are two random variables with ( ) ( )E X E Y μ= = . If X is more concentrated about μ than is Y , then ( ) ( )Var X Var Y≤
Moments
The expected value of X is also called the first moment of X . Provided that ( )( ) ( ), E g X E g X< ∞ ⎡ ⎤�