輔仁大學 96 學年度課程綱要表scholar.fju.edu.tw/課程大綱/upload/049469/content/961... · 2007-09-12 · 輔仁大學 96 學年度課程綱要表開課系所名稱及代碼

輔仁大學 96 學年度課程綱要表

開課系所名稱及代碼

電子工程學系二年級

必修

選修

上

下開課科目代碼

D-5002-02802

選

別

學分數 3

課程名稱 (中英文並列)

Fundamentals of Probability 機率學

任課教授 (中英文並列)

Yu, Jung-Lang 余金郎

課程目標

The aim of this course is to present probability in a natural way through interesting and instructive examples and exercises that motivates the theory, definitions, theorems and methodology. For the students to: (1) Acquire an understanding of the basic concepts of probability. (2) Become interested in the theory of probability. (3) Improve their problem solving skills and develop their proof writing

techniques

課程綱要

1. Axioms of Probability. (0.5week)

2. Combinatorial Methods. (0.5W)

3. Conditional Probability and Independence. (2.0W)

4. Distribution Functions and Discrete Random Variables. (2.0W)

5. Special Discrete Variables. (1.0W)

6. Continuous Random Distributions. (1.0W)

7. Special Continuous Distributions. (2.0W)

8.Joint Distributions. (2.5W)

9. More Expectations and Variances. (1.5W)

10. Sums of Independent RVs and Limit Theorems. (1.0W) 先修課程 None 建議續修科目 Signal and System,

指定教材 (用書、器材)

S. Ghahramani, ‘Fundamentals of Probability’, 3rd Edition, 2005 (新月)

參考書目

R.E. Ziemer, ‘Elements of engineering probability and statistics’, 2ndEdition, 1997 (全華) P.L. Meyer, ‘Introductory probability and statistical applications’, 2ndEdition, 1970 (中央)

備註 Quizs 40%, Midterm 25% Final 35%

Probability Theory. Chapter 0: Introduction 1/4

Department of Electronic Engineering, Fu Jen Catholic University August 13, 2007

Introduction to Probability Theory

Instructor: Dr Jung-Lang Yu(余金郎). Office:SF709, Tel: 29052102

Lectures: Mon. 08:10-09:00, room SF338, Tue. 10:10-12:00, room SF132.

Office hours: Thu. 14:00-16:00, Fri. 14:00-16:00, room SF709 (by appointment if necessary).

E-mail: [email protected]

Textbook: Fundamentals of Probability by Saeed Ghahramani, 3rd edition, Prentice-

Hall, Inc. The course will cover some but not all of the material.

Material Download: http://www.ee.fju.edu.tw/communication/main.html

Prerequisites: Calculus.

Course description: Basic notions of probability, discrete and continuous probability distributions, expectation and conditional expectation, independence, limit theorems and notions of convergence.

My role in this course is to help you learn and understand the material. Please come and see me in my office hours immediately if you have troubles with the material, so we can intervene early. Mathematics is not a selection of disassociated pieces, but the parts build upon each other. Mastering mathematics is not very different from mastering a sport. You need to practice to get the moves right, and then you need to understand the dynamics to put the different moves together for a coherent strategy. Your part consists of

♦ doing the homework (math is learned by doing, not by watching). It has been shown that Time-on-Task (TOT) (which consists of taking notes, re-reading the material after class, doing homework and studying) has the highest correlation with the final grade!!

♦ to ask if you do not understand. Asking does not mean that you are stupid, but rather that you are smart enough to realize that clarifying what bothers you is an important step in doing well in this (or any) class. For the majority of questions there will be many grateful classmates who had the same question, but did not have the guts to ask.

♦ to take advantage of my office hours. This time is set aside for you to come and ask questions. You can make the most of this time by coming prepared – bring your notes with marks on where you have questions, and show me where you get stuck in solving a particular problem. I can assist you best when you have specific questions.



In terms of studying, I urge you to work with other students. Even if your schedule is such that you cannot meet with somebody else, identify somebody whom you can call. Often hours of frustration can be avoided by talking to somebody else. Think about the problems by yourself first, and then ask about the ones you had trouble with.

In preparation for the class, you should read ahead in the material (find the places which they appear in the book and mark them by red pen). This way you can pay particular attention to anything that did not make sense to you. In addition, it is a good idea keep a note book in which you record definitions, theorems and formulas (this will be a good summary for review before exams and will come in handy in any course in which you may need this material). You may also want to summarize the sections/chapters to help in studying for exams. Another use of this notebook is to record which parts of the material/ homework questions you had trouble with. Often when writing down your difficulties they may clear up. I will not collect/check this notebook, but that should not keep you from making entries.

Course Grade:

Your final grade will be computed as follows:

Quiz 40% ( 4 quizzes during this class )

Midterm Exam 25%

Final Exam 35%

Attendance (I expect you to be on time and to stay until the end of class!) and class participation will be taken into account in borderline cases.

Important date:

Midterm Exam: 12-16 November, 2007

Final Exam: 14-18 January, 2008

Quiz:

1st: Oct. 18, 2007

2nd: Nov. 07, 2007

3rd: Dec. 13, 2007

4th: Jan. 09, 2008



Homework: The minimum exercises essential for the course are listed in the following.

♦ Chapter 1:

Section 1.2 1,3,5,7 Section 1.4 1,3,7,9,19 Section 1.7 1,5,10

♦ Chapter 2: Section 2.2 1,3,5,8,9,11,17 Section 2.3 1,5,9 Section 2.4 1,3,5,11,13,15,21,37

♦ Chapter 3: Section 3.1 1,3,5,11 Section 3.2 1 Section 3.3 1,3,7 Section 3.4 1,3,7,9 Section 3.5 3,5,7,9,15,17,25,27,34,35

♦ Chapter 4: Section 4.2 1,5,11 Section 4.3 1,3,7,11 Section 4.4 3,5,7 Section 4.5 1,3,5,7 Section 4.6 1

♦ Chapter 5: Section 5.1 1,3,9,13,17,19 Section 5.2 1,3,5,7,9 Section 5.3 1,5,13,17

♦ Chapter 6: Section 6.1 1,3,5,7,9 Section 6.2 3,7 Section 6.3 2,3,4,5,7 Review 9,12

♦ Chapter 7: Section 7.1 1,5,11 Section 7.2 1,9,11,15,17,21 Section 7.3 3,5,6

♦ Chapter 8: Section 8.1 1,5,7,9,11,13,15,17,25 Section 8.2 1,3,5,9,13,15,17 Section 8.3 1,5,7,11,13

♦ Chapter 10: Section 10.1 3,5,7 Section 10.2 1,3,5,9,11 Section 10.3 1,3,5 Section 10.4 1,3,5,19



Section 10.5 1,3,5

♦ Chapter 11: Section 11.1 1,3,4,11,13,15,17 Section 11.2 1,11,15 Section 11.3 1,3,5,7 Section 11.4 1 Section 11.5 1,3,5

Probability Theory. Chapter 1: Axioms of Probability 1/8


Chapter 1. Axioms of Probability

1.1. Introduction

Probability

Relative frequency interpretation

(No. of desired outcomes) / (No. of experiments)

Counting the number of different ways that a certain event can occur

( )limn

n An→∞

1.2. Sample Space and Events

Sample Space

Set of all possible outcomes of an (a random) experiment

Usually denoted by S

Outcomes: sample points of S

Event

Subset of S

Set of outcome(s)

Example 1.1 For the experiment of tossing a coin once, the sample space S consists of two pints (outcomes), “heads” (H) and “tails” (T). Thus S={H,T}.

Example 1.2 Suppose that an experiment consists of two steps. First a coin is flipped. If the outcome is tails, a die is tossed. If the outcome is heads, the coin is flipped again. The sample space of this experiment is S = {T1, T2, T3, T4, T5, T6, HT, HH}. For this experiment, the event of heads in the first flip of the coin is E = {HT, HH}, and the event of an odd outcome when the die is tossed is F = {T1, T3, T5}

Relations between events

Subset



E is said to be a subset of F if whenever E occurs F also occurs. E F⊆

Equality

E and F are said to be equal if the occurrence of E implies the occurrence of F, and vice versa.

E F⊆ and F E⊆ , hence E F= .

Intersection

An event is called the intersection of two events E and F if it occurs only whenever E and F occur simultaneously.

EF or { }| and E F x x E x F∩ = ∈ ∈

Union

An event is called the union of two events E and F if it occurs whenever at least one of them occurs.

{ }| or E F x x E x F∪ = ∈ ∈

Complement

An event is called the complement of the event E if it only occurs whenever E does not occur.

{ }| cE S E x x E= − = ∉ ,

Difference { }| and cA B x x A x B A B− = ∈ ∉ = ∩

Certain Event

An event is called certain if its occurrence is inevitable

S

Impossible Event

An event is called impossible if there is certainty in its nonoccurrence. ∅ (the empty set), cS

Mutually Exclusive (disjoint)

If the joint occurrence of two events E and F is impossible, we say that E and F are mutually exclusive (disjoint).

EF =∅ A set of mutually exclusive events 1 2{ , }nE E E is called mutually exclusive if the

joint occurrence of any two of them is impossible, that is, 1 2 , for E E i j=∅ ∀ ≠ .

Example 1.7 At a busy international airport, arriving planes land on a first-come, first-served basis. Let E, F, and H be the events that there are, respectively, at least five, at



most three, and exactly two planes waiting to land. Then

♦ Ec is the event that at most four planes are waiting to land.

♦ Fc is the event that at least four planes are waiting to land.

♦ E is a subset of Fc; that is, if E occurs, then Fc occurs. Therefore, EFc = E.

♦ H is a subset of F; that is, if H occurs, then F occurs. Therefore, FH = H.

♦ E and F are mutually exclusive; that is, EF = ∅ . E and H are also mutually exclusive since EH =∅ .

♦ FHc is the event that the number of planes waiting to land is zero, one or three.

Venn Diagram: Figure 1.1

The shaded regions are the indicated events.

More relations: ( )c cE E= , cE E S∪ = , cEE =∅

Commutative Laws: E F F E∪ = ∪ , EF FE= Associative Laws: ( ) ( )E F G E F G∪ ∪ = ∪ ∪ , ( ) ( )E FG EF G=

Distributive Laws: ( ) ( )( )EF G E G F G∪ = ∪ ∪ , ( ) ( )E F G EF EG∪ = ∪

Demogan’s First Laws: 1 1

( ) , ( )n n

c c c c ci i

i i

E G E G E E= =

∪ = =∪ ∩

Demogan’s Second Laws: 1 1

( ) , ( )n n

c c c c ci i

i i

EG E G E E= =

= ∪ =∩ ∪

( )c cE ES E F F EF EF= = ∪ = ∪ , EF and cEF are disjoint



1.3. Axioms of Probability

Definition (Probability Axioms)

Let S be the sample space of a random phenomenon. Suppose that to each event A of S, a number denoted by P(A), is associated with A. If P satisfies the following axioms, then it is called a probability and the number P(A) is said to be the probability of A. Axiom 1: ( ) 0P A ≥

Axiom 2: P(S) = 1 Axiom 3: If 1 2 3{ , , , }A A A is a sequence of mutually exclusive events (i.e.

,i jA A i j=∅ ≠ ) then 11

P ( ) ( )i iii

A P A∞ ∞

==

=∑∪

Note that the axioms of probability are a set of rules that must be satisfied before S and P can be considered a probability model. Probability is a real-valued nonnegative, countably additive function.

“Equally likely” or “Equally probable”: we say that (A,B) are equally likely if P(A) = P(B). Sample points 1w and 2w are equally likely if ( ) ( )1 2{ } { }P w P w= .

Theorem 1.1 ( ) 0P ∅ = . The probability of the empty set is 0.

Theorem 1.2 Let 1 2 3{ , , , }nA A A A be a mutually exclusive set of events. Then

11

P ( ) ( )n n

i iii

A P A==

=∑∪ Note that if 1A and 2A are mutually exclusive, then 1 2 1 2P ( ) ( ) ( )A A P A P A∪ = + . Also note that 1=P(S)=P ( ) ( ) ( )c cA A P A P A∪ = + , and then ( ) 1P A ≤ . Hence the probability of the occurrence of an event is always some number between 0 and 1, 0 ( ) 1P A≤ ≤ .

Example 1.9 A coin is called unbiased or fair if, whenever it is flipped, the probability of obtaining tails. Suppose that in an experiment is an unbiased coin is flipped. The sample space of such an experiment is { }S= T, H . Since the events { }H and { }T are equally likely to occur, { }( ) { }( )P T =P H ,and since they are mutually exclusive,

{ }( ) { }( ) { }( )P T,H =P T +P H Hence Axioms 2 and 3 imply that

{ }( ) { }( ) { }( ) { }( ){ }( ) { }( ) { }( )

1=P S =P H,T =P H +P T

=P H +P H =2P H .



This gives that { }( ) { }( )P H =1/2 and P T =1/2. Now suppose that an experiment consists of flipping a biased coin where the outcome of tails is twice as likely as heads; then

{ }( ) { }( )P T = 2P H .Hence { }( ) { }( ) { }( ) { }( ){ }( ) { }( ) { }( )

1=P S =P H,T =P H +P T

=P H +2P H =3P H .

This shows that { }( )P H =1/3 ; thus { }( )P T =2/3.

Example1.10 Sharon has baked five loaves of bread that are identical except that one of them is underweight. Sharon’s husband chooses one of these loaves at random. Let Bi ,1 i 5,≤ ≤ be the event that he choose the i-th loaf. Since all five loaves are equally likely

to be drawn, we have

{ }( ) { }( ) { }( ) { }( ) { }( )1 2 3 4 5P B =P B =P B =P B =P B . But the events { } { } { } { } { }B , B , B , B , B1 2 3 4 5 are mutually exclusive, and the sample space is { }1 2 3 4 5P B ,B ,B ,B ,B Therefore, by Axioms 2 and 3,

( ) { }( ) { }( ) { }( ) { }( ) { }( ) { }( )1 2 3 4 5 11=P S =P B +P B +P B +P B +P B =5 P B .i This gives { }( )1P B =1/5 and hence { }( )iP B =1/5 ,1 i 5.≤ ≤ Therefore, the probability that Sharon’s husband chooses the underweight loaf is 1/5.

Theorem 1.3

Let S be the sample space of an experiment. If S has N points that are all equally likely to occur, then for any event A of S,

( )( ) N AP AN

=

Where N(A) is the number of points of A

Example 1.11 Let S be the sample space of flipping a fair coin three times and A be the event of at least two heads; then

{ }S= HHH,HTH,HHT,HTT,THH,THT,TTH,TTT

and A={ }HHH,HTH,HHT,THH . So N=8 and N(A)=4. Therefore, the probability of at lease two heads in flipping a fair coin three times is N(A)/N=4/8=1/2

Example 1.12 An elevator with two passengers stops at the second, third, and fourth floors. If it is equally likely that a passengers gets off at any of the three floors. What is the probability that the passengers get off at different floors?

Solution: Let a and b denote the tow passengers and a2b4 mean that a gets off at the



second floor and b gets off at the fourth floor, with similar representations for other cases. Let A be the event that the passengers gets off at different floors. Then

{ 2 2, 2 3, 2 4, 3 2, 3 3, 3 4, 4 2, 4 3, 4 4}S a b a b a b a b a b a b a b a b a b=

And { 2 3, 2 4, 3 2, 3 4, 4 2, 4 3}A a b a b a b a b a b a b= . So N＝9 and N(A)＝6. Therefore, the desired probability is N(A)/N＝6/9＝2/3.

Example 1.13 A number is selected at random form the set of natural numbers {1,2,3,4, ,1000}. What is the probability that the number is divisible by 3 ?

Solution: Here the sample space contains 1000 points so N＝1000. Let A be the set of all numbers between 1 and 1000 which are divisible by 3. Then A={3m:1 m 333}≤ ≤ . So N(A)

＝333. Therefore, the probability that a random natural number between 1 and 1000 is divisible by 3 is equal to 333/1000=0.333.

1.4. Basic Theorems

Theorem 1.4 For any event A, ( ) 1 ( )cP A P A= −

Theorem 1.5 If A B⊆ , then ( ) ( ) ( ) ( )cP B A P BA P B P A− = = −

Corollary If A B⊆ , then ( ) ( )P B P A≥

Theorem 1.6 ( ) ( ) ( ) ( )P B A P B P A P B A∪ = + − ∩

From Theorem 1.6, we have

1 2 3 1 2 3 1 2 1 3

2 3 1 2 3

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )P A A A P A P A P A P A A P A A

P A A P A A A∪ ∪ = + + − −

− +

1 2 3 4 1 2 3 4 1 2

1 3 1 4

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )P A A A A P A P A P A P A P A A

P A A P A A∪ ∪ ∪ = + + + −

− − −

Inclusion-Exclusion Principle 1

1 1 11

P ( ) ( ) ( )n n n n

i i i ji i j ii

A P A P A A−

= = = +=

= − + − +∑ ∑ ∑∪

Example 1.15 Suppose that in a community of 400 adults, 300 bike or swim or do both, 160 swim, and 120 swim and bike. What is the probability that an adult, selected at random



from this community, bikes? Answer: 0.65

Example 1.16 A number is chosen at random from the set of numbers {1,2,3,…,1000}. What is the probability that it is divisible by 3 or 5 (i.e. either 3 or 5 or both)? Answer: .0467

Theorem 1.7 ( ) ( ) ( )cP A P AB P AB= +

Example 1.19 In a community, 32% of the population are male smokers; 27% are female smokers. What percentage of the population of this community smoke?

Answer: 59%

Odds: in the real world, especially for games of chance, it is common to express probability in terms of odds.

The odds in favor of an event A are r to s if ( )( )c

P A rP A s

= ( or ( ) rP Ar s

=+

)

The odds against an event A are r to s if ( )( )

cP A rP A s

= ( or ( ) sP Ar s

=+

)

Thus if the odds in favor of an event A are r to s, then the odds against an event A are s to r.

Example 1.11: the odds in favor of HHH are 1/8 to 7/8 or, equivalently, 1 to 7.

1.5. Continuity of Probability Function

Continuity for a real-valued function lim ( ) ( ),

x cf x f c c R

→= ∀ ∈

equivalent to the sequential criterion lim ( ) (lim ),n nn nf x f x→∞ →∞= for every convergent sequence 1{ }n nx

∞= in R.

increasing sequence: A sequence { , 1}nE n ≥ of events of a sample space is called

increasing if 1 2 3 1n nE E E E E +⊆ ⊆ ⊆ ⊆ ⊆ ⊆ . Therefore, 1

lim n nnn

E E∞

→∞=

=∪ . decreasing sequence: A sequence { , 1}nE n ≥ of events of a sample space is called

decreasing if 1 2 3 1n nE E E E E +⊇ ⊇ ⊇ ⊇ ⊇ ⊇ . Thus, 1

lim n nnn

E E∞

→∞=

=∩



Theorem 1.8 (Continuity of Probability Function) For any increasing or decreasing sequence of events, { , 1}nE n ≥

lim ( ) (lim )n nn nP E P E→∞ →∞=

1.6. Probabilities 0 and 1

There exist infinitely many events each with probability 1, and infinitely many events each with probability 0.

( ) 1P E = doesn’t mean E=S, and ( ) 0P E = doesn’t mean E = ∅ .

In random selection of points from a continuous period, say (0,1), the probability of the occurrence of any particular point is 0 (shown in page 30 via an example). let

{ }, (0,1)E t t= ∈ and (0,1) { }tB t= − . Then ( ) 0P E = and ( ) 1tP B = . Therefore, there

are infinitely many events with probability 1 and 0, respectively.

1.7. Random Selection of Points from Intervals

Probability that random selection of points from intervals (a,b) is 0 (Section 1.6)

The probability that a random point selected from (a,b) falls into the interval ( , )2

a ba +

is 1/2. The probability that it falls into ( , )2

a b b+ is also 1/2.

The probability of the event that a random point from (a,b) falls into a subinterval

( , )α β is equal to b aβ α−−

. (can be induced from Theorem 1.3)

Definition (uniform distribution)

A point is said to be randomly selected from an interval (a,b) if any two subintervals of (a,b) that have the same length are equally likely to include the point. The probability associated with the event that the subinterval ( , )α β contains the point is defined to

be b aβ α−−

.

Probability Theory. Chapter 2: Combinatorial Methods 1/6


Chapter 2. Combinatorial Methods

2.1. Introduction

Combinatorial Analysis

Dealing with method of counting

Enabling us to count systematically

Some problems in probability can be solved just by counting the total number of sample points and the number of ways that an event can occur when all sample points are equally likely. (c.f. Theorem 1.3)

2.2. Counting Principle

Theorem 2.1 (Counting Principle) If the set E contains n elements and the set F contains m elements, there are nm ways in which we can choose first an element of E and then an element of F.

Theorem 2.2 (Generalized Counting Principle) Let 1 2, , , kE E E be the sets with 1 2, , , kn n n elements, respectively. Then there are

1 2 kn n n× × × ways in which we can first choose an element of 1E , then an element of 2E , then an element of 3, ,E and finally an element of kE

Example 2.1 How many outcomes are there if we throw five dice? Answer: 56

Example 2.2 In tossing four fair dice, what is the probability of at least one 3? Answer: 671/1296~=0.52

Example 2.4 Rose has invited n friends to her birthday party. If they all attend, and each one shakes hands with everyone else at the party exactly once, what is the number of handshakes? Answer: n(n+1)/2

Example 2.6 (Standard Birthday Problem) What is the probability that at least two students of a class of size n have the same birthday? Compute the numerical values of such



probabilities for n = 23,30,50, and 60. Assume that the birth rates are constant throughout the year and that each year has 365 days. Answer: 0.507, 0.706, 0.970, 0.995.

Power Set The set of all subsets of a set A is called the power set of A.

Theorem2.3 (Number of Subsets of a Set)

A set with n element has 2n subsets. (The proof is based on the generalized counting principle.)

Example 2.7 A restaurant advertises that it offers over 1000 varieties of pizza. If, at the restaurant, it is possible to have on a pizza any combination of pepperoni, mushrooms, sausage, green peppers, onions, anchovies, salami, bacon, olives, and ground beef, is the restaurant’s advertisement true? Answer: True.

Tree Diagrams

Useful pictorial representations breaking down a complex counting problem into smaller, more tractable ones.

The number of possible ways that an experiment can be performed is finite. Systematically identifying all possible cases.

Example 2.8 Bill and John Keep playing chess until one of them wins two games in a row or three games altogether. In what percent of all possible cases does the game end because Bill wins three games without winning two in a row? Answer: 10%

Example 2.9 Mark has $4. He decides to bet $1 on the flip of a fair coin four times. What is the probability that (a) he breaks even; (b) he wins money? Answer: 6/16, 5/16.



2.3. Permutations

Definition An ordered arrangement of r objects from a set A containing n objects (0 )r n< ≤ is called an r -element permutation of A , or a permutation of the elements of A taken r at a time. The number of r -element permutations of a set A containing n elements is denoted by n rP .

Notes: !( 1)( 2) ( 1)( )!n r

nP n n n n rn r

= − − − + =−

and

( 1)( 2) ( 1) !n nP n n n n n n= − − − + =

Example 2.10 Three people, Brown, Smith, and Jones, must be scheduled for job interviews. In how many different orders can this be done? Answer: 6.

Example 2.11 Suppose that two anthropology, four computer science, three statistics, three biology, and five music books are put on a bookshelf with a random arrangement. What is the probability that the books of the same subject are together? Answer: 8( ) 5! 2! 4! 3! 3! 5!/17! 7 10P A −= × × × × × ≈ ×

Example 2.12 If five boys and five girls sit in a row in a random order, what is the probability that no two children of the same sex sit together? Answer: ( ) 2 5! 5!/10! 0.008P A = × × ≈

Theorem 2.4

The number of distinguishable permutation of n objects of k different types, where



1n are alike, 2n are alike , , kn are alike and 1 2 kn n n n= + + + is 1 2

!! ! !k

nn n n× × ×

.

Example 2.13 How many different 10-letter codes can be made using three a’s, four b’s, and three c’s? Answer: 10!/(3! 4! 3!) 4200× × =

Example 2.15 A fair coin is flipped 10 times. What is the probability of obtaining exactly three heads? Answer: 10( ) 10!/(3! 7!) / 2 0.12P A = × =

2.4. Combinations

Definition An unordered arrangement of r objects from a set A containing n objects (0 )r n< ≤ is called an r -element combination of A or a combination of the elements of A taken r at a time.

Notes: !

( )! !n rn nCr n r r

⎛ ⎞= =⎜ ⎟ − ×⎝ ⎠

(!

n rPr

= )

1, 0 1 1n n n n

nn n

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞= = = =⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

For any 1

0 , , 1

n n n n nr n

r n r r r r+⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞

≤ ≤ = = +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟− −⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

Example 2.16 In how many ways can two mathematics and three biology books be selected from eight mathematics and six biology books? Answer: 560

Example 2.18 In a small town, 11 of the 25 schoolteachers are against abortion, eight are for abortion, and the rest are indifferent. A random sample of five schoolteachers is selected for an interview. What is the probability that (a) all of them are for abortion; (b) all of them have the same opinion? Answer: 0.0011, 0.0099

Example 2.20 From an ordinary deck of 52 cards, seven cards are drawn at random and without replacement. What is the probability that at least one of the cards is a king? Answer: 0.4496



Example 2.21 What is the probability that a poker hand is a full house? A poker hand consists of five randomly selected cards from an ordinary deck of 52 cards. It is a full house if three cards are of one denomination and two cards are of another denomination: for example, three queens and two 4’s.

Answer:

13 12 4 41 1 3 2

0.0014525

⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠ ≈

⎛ ⎞⎜ ⎟⎝ ⎠

Example 2.22 Show that the number of different ways n indistinguishable objects can be

placed into k distinguishable cell is 1 1

1n k n k

n k+ − + −⎛ ⎞ ⎛ ⎞

=⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠

Example 2.23 Let n be a positive integer, and let 1 2 kx x x n+ + + = be a given equation. A vector 1 2( , , , )kx x x satisfying 1 2 kx x x n+ + + = is said to be a nonnegative integer solution of the equation if for each ,i 1 i k≤ ≤ , ix is a nonnegative integer. It is said to be a positive integer solution of the equation if for each ,i 1 i k≤ ≤ , ix is a positive

integer. (a) How many distinct nonnegative integer solutions does the equation 1 2 kx x x n+ + + = have? (b) How many distinct positive integer solutions does the

equation 1 2 kx x x n+ + + = have?

Theorem 2.5 (Binomial Expansion) For any integer 0,n ≥

( )

0

1 2 2 1 0 1 2 1

nn n i i

i

n n n n n

nx y x y

i

n n n n nx x y x y xy y

n n

−

=

− − −

⎛ ⎞+ = ⎜ ⎟

⎝ ⎠⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞

= + + + + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

∑

Example 2.25 What is the coefficient of 2 3x y in the expansion of 5(2 3 )x y+ ? Answer: 1080

Example 2.26 Evaluate the sum 0 1 2 3n n n n n

n⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞

+ + + + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

.

Answer: 2n .

Example 2.27 Evaluate the sum 2 31 2 3n n n n

nn

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ + + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠.



Answer: 12nn −

Solution:

( ) ( )n n-1n! n (n-1)!i =i = =n .i i-1 i! n-i ! (i-1)! n-i !

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

ii

So from Example 2.26, we have

n-1

n n n n+2 +3 + +n

1 2 3 n

n-1 n-1 n-1 n-1 =n + + + +

0 1 2 n-1

=n 2 .

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦

i

Example 2.28 Prove that 2

0

2 n

i

n nn i=

⎛ ⎞ ⎛ ⎞=⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠∑ .

Theorem 2.6 (Multinomial Expansion)

1 2

1 2

1 2 1 21 2

!( )! ! !

k

k

nn nnk k

x x x n k

nx x x x x xn n n+ + + =

+ + + = ∑

Note that the sum is taken over all nonnegative integers 1 2, , , kn n n such that

1 2 kn n n n+ + + = .

2.5. Stirling’s Formula

Theorem 2.7 (Stirling’s Formula) ! 2 ,n nn nn eπ −∼

where the sign ~ means !lim 1

2 n nnnnn eπ −→∞

=

Probability Theory. Chapter 3: Conditional Probability and Independence 1/15

Department of Electronic Engineering, Fu-Jen Catholic University August 13, 2007

Chapter 3. Conditional Probability and Independence

3.1. Conditional Probability

The conditional probability of A given B

Knowing that event B has occurred changes the chances of the occurrence of event A

Denoted by ( )|P A B ：Probability of A given B

Definition：

If ( ) 0P B > , ( ) ( )( )|

P ABP A B

P B= . (AB : joint occurrence of two events)

Example 3.1 In a certain region of Russia, the probability that a person lives at least 80 years is 0.75, and the probability that he or she lives at least 90 years is 0.63. What is the probability that a randomly selected 80-year-old person from this region will survive to become 90? Answer: 0.84

Example 3.2 From the set of all families with two children, a family is selected at random and is found to have a girl. What is the probability that the other child of the family is a girl? Assume that in a two-child family all sex distributions are equally probable. Answer: 1/3

Example 3.3 From the set of all families with two children, a child is selected at random and is found to be a girl. What is the probability that the second child of this girl’s family is also a girl? Assume that in a two-child family all sex distributions are equally probable. Answer: 1/2

Example 3.5 We draw eight cards at random from an ordinary deck of 52 cards. Given that three of them are spades, what is the probability that the remaining five are also spades? Answer: 65.44 10−×

( )8

-6

3

13 3913 8 -( )( | ) 5.44 10 .

5288

X

X XP ABP A BP B =

⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎛ ⎞ ⎝ ⎠⎝ ⎠= = ≈ ×⎜ ⎟ ⎛ ⎞⎝ ⎠

⎜ ⎟⎝ ⎠

∑



Theorem 3.1

Let S be the sample space of an experiment, and let B be an event of S with P(B)>0. Then the conditional probability satisfies the probability axioms, i.e., (1) ( | ) 0P A B ≥ for any event A of S

(2) ( | ) 1P S B =

(3) If 1 2, , , ,nA A A… is a sequence of mutually exclusive events, then

( )11

| |i iii

P A B P A B∞ ∞

==

⎛ ⎞=⎜ ⎟

⎝ ⎠∑∪

Reduction of Sample Space Let B be an event of a sample space S with P(B)>0. For A B⊆ , define ( ) ( | )Q A P A B= .

: ( ) [0,1], ( ) 0, ( ) ( | ) 1Q subset B Q A Q B P B B→ ≥ = = , and

If 1 2, , , ,nA A A… is a sequence of mutually exclusive subset of B, then

( ) ( )1 11 1

| |i i i ii ii i

Q A P A B P A B Q A∞ ∞ ∞ ∞

= == =

⎛ ⎞ ⎛ ⎞= = =⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠∑ ∑∪ ∪

Thus Q satisfies the probability axioms and is a probability function.

For Q, the sample space is reduced from S to B. ( cf. For P, the sample space is S) Suppose that we are interested in P(E|B) where E S⊆ . It is usually easier to compute Q(EB) rather than P(E|B)

Example 3.6 A child mixes 10 good and three dead batteries. To find the dead batteries, his father tests them one-by-one and without replacement. If the first four batteries tested are all good, what is the probability that the fifth one is dead? Answer: 3/9

Example 3.7 A farmer decides to test four fertilizers for his soybean fields. He buys 32 bags of fertilizers, eight bags from each kind, and tries them randomly on 32 plots, eight plots from each of field A, B, C, and D, one bag per plot. If from type I fertilizer one bag is tried on field A and three on field B, what is the probability that two bags of type I fertilizer are tried on field D? Answer: 0.43

For all choices of B, P(B)>0 ( | ) 0P B∅ =

( | ) 1 ( | )cP A B P A B= −

If C A⊆ , then ( | ) ( | ) ( | ) ( | )cP AC B P A C B P A B P C B= − = −

If C A⊆ , then ( | ) ( | )P C B P A B≤



( | ) ( | ) ( | ) ( | )P A C B P A B P C B P AC B∪ = + −

( | ) ( | ) ( | )cP A B P AC B P AC B= +

1

1 1 11

P ( | ) ( | ) ( | )n n n n

i i i ji i j ii

A B P A B P A A B−

= = = +=

= − + − +∑ ∑ ∑∪

For any increasing or decreasing sequence of events, { , 1}nA n ≥ , lim ( | ) (lim | )n nn nP A B P A B→∞ →∞=

3.2. Law of Multiplication

Law of Multiplication Calculating ( )P AB from ( )|P A B or ( )|P B A

( ) ( ) ( )|P AB P B P A B= , ( ) 0P B >

( ) ( ) ( )|P BA P A P B A= , ( ) 0P A >

( ) ( ) ( ) ( ) ( ) ( ) ( )| |P AB P BA P AB P A P B A P B P A B= ⇒ = =∵

Example 3.9 Suppose that five good fuses and two defective ones have been mixed up. To find the defective fuses, we test them one-by-one, at random and without replacement. What is the probability that we are lucky and find both of the defective fuses in the first two tests? Answer: 1/21

Solution: Let D1 and D2 be the events of finding a defective fuse in the first and second tests, respectively. We are interested in P(D1D2). Using (3.5), we get

1 2 1 2 12 1 1( ) ( ) ( | ) = = 7 6 21

P D D P D P D D= ×

Extended Law of Multiplication:

( ) ( ) ( ) ( )| |P ABC P A P B A P C AB=

since ( ) ( ) ( ) ( ) ( )( )( )( ) ( )

| |P BA P CAB

P A P B A P C AB P A P ABCP A P AB

= =

Theorem 3.2 Generalized Law of Multiplication If 1 2 1( ) 0nP A A A − >… , then

1 2 1 2 1 3 1 2 1 2 1( ) ( ) ( | ) ( | ) ( | )n n nP A A A P A P A A P A A A P A A A A −=… …



Example 3.11 Suppose that five good and two defective fuses have been mixed up. To find the defective ones, we test them one-by-one, at random and without replacement. What is the probability that we find both of the defective fuses in exactly three tests? Answer: 0.095

Solution: Let D1, D2, and D3 be the events that the first, second, and third fuses tested are defective, respectively. Let G1, G2, and G3 be the events that the first, second, and third fuses tested are good, respectively. We are interested in the probability of the event

1 2 3 1 2 3G D D D G D∪ . We have

1 2 3 1 2 3 1 2 3 1 2 3

1 2 1 3 1 2 1 2 1 3 1 2

( ) ( ) ( ) ( ) ( | ) ( | ) ( ) ( | ) ( | )

5 2 1 2 5 1 = + 0.095.7 6 5 7 6 5

P G D D D G D P G D D P D G DP G P D G P D G D P G P D G P D G D

∪ = += +

× × × × ≈

3.3. Law of Total Probability

Theorem 3.3 (Law of Total Probability)

Let B be an event with ( ) 0P B > and ( ) 0cP B > . Then for any event A, ( ) ( ) ( ) ( ) ( )| | c cP A P A B P B P A B P B= +

Proof: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )| |c c c cP A P AB AB P AB P AB P A B P B P A B P B= ∪ = + = + .

Example 3.12 An insurance company rents 35% of the cars for its customers from agency I and 65% from agency II. If 8% of the cars of agency I and 5% of the cars of agency II break down during the rental periods, what is the probability that a car rented by this insurance company breaks down?

Answer: 0.0605

( ) ( ) ( ) ( ) ( )| I I | II II =0.0605P A P A P P A P= +



Example 3.14 (Gambler’s Ruin Problem) Two gamblers play the game of “heads or tails,” in which each time a fair coin lands heads up, player A wins $1 from B, and each time it lands tails up, players B wins $1 from A. Suppose that player A initially has a dollars and player B has b dollars. If they continue to play this game successively, what is the probability that (a) A will be ruined; (b) the game goes forever with nobody winning? Answer: b/(a+b), either A is ruined or B is ruined.

Solution:

(a) Let E be the event that A will be ruined if he or she starts with i dollars, and let ( )ip P E= . Our aim is to calculate ap . To do so, we define F to be the event that A wins

the first game. Then ( ) ( | ) ( ) ( | ) ( ).C CP E P E F P F P E F P F= +

In this formula, P(E I F) is the probability that A will be ruined, given that he wins the first game; so P(E I F) is the probability that A will be ruined if his capital is i+1; that is,

1( | ) iP E F p += . Similarly, 1( | )c

iP E F p −= Hence

1 -11 1 .2 2i i i

p p p+= +i i (3.8)

Now 0 1p = because if A starts with 0 dollars, he or she is already ruined. Also, if the capital of A reaches a+b, then B is ruined; thus 0a bp + = . Therefore, we have to solve the system of recursive equations (3.8), subject to the boundary conditions 0 1p = and 0a bp + = . To do so,

note that (3.8) implies that

1 1 i i i ip p p p+ −− = −

Hence, letting 1 0- p p α= , we get

-1 -1 -2 -2 -3

2 1 1 0

i i i i i ip p p p p pp p p p α

− = − = −= = − = − =

Thus



1 0

2 1 0 0

3 2 0 0

0

2

2 3

i

p pp p p pp p p p

p p i

αα α α αα α α α

α

= += + = + + = += + = + + = +

= +

Now 0 1p = gives 1 . But 0 ; thus 0 1 ( )i a bp i p a bα α+= + = = + + . This gives 1/( )a bα = − + ;

therefore,

-1 .ii a b ip

a b a b+

= − =+ +

In particular, /( )ap b a b= + . That is, the probability that A will be ruined is b/(a+b).

(b) The same method can be used with obvious modification to calculate iq , the probability

that B is ruined if he or she starts with i dollars. The result is

- .ia b iqa b+

=+

Since B starts with b dollars, he or she will be ruined with probability /( )bq a a b= + . Thus the probability that the game goes on forever with nobody winning is 1- ( )b aq p+ . But 1- ( ) 1- /( ) - /( ) 0b aq q a a b b a b+ = + + = . Therefore, if this game is played successively,

eventually either A is ruined or B is ruined.

Remark3.2 If the game is not fair and on each play gambler A wins $1 from B with probability p, 0 1, 1/ 2p p< < ≠ , and loses $1 to B with probability q = 1-p, relation(3.8) becomes

-1 -1.i i ip pp qp= +

But ( ) ,i ip p q p= + so -1 -1( ) i i ip q p pp qp+ = + . This gives

-1 1( - ) ( - ),i i i iq p p p p p+=

which reduces to

1 -1- ( - ).i i i iqp p p pp+

=

Using this relation and following the same line of argument lead us to -1 -2

0.1i i

iq q qp pp p p

α⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥= + + + + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦

where 1 0p pα = − Using 0 1p = , we obtain

( )( )

/ -11

/ -1

i

iq p

pq p

α= +

After finding α from 0a bp + = and substituting in this equation, we get



( ) ( )( )

( )( )

-/ - / 1- /,

1- / 1- /

i a b a b i

i a b a b

q p q p p qp

q p p q

+ +

+ += =

where the last equality is obtained by multiplying the numerator and denominator of the function by ( )/ a bq p + . In particular,

( )( )

1- /,

1- /

b

a a b

p qp

p q +=

And, similarly,

( )( )

1- /.

1- /

b

b a b

q pq

q p +=

In this case also, 1a bp q+ = , meaning that eventually either A or B will be ruined and the

game does not go on forever, For comparison, suppose that A and B both start with $10. If they play a fair game, the probability that A will be ruined is 1/2, and the probability that B will be ruined is also 1/2. If they play an unfair game with p=3/4, q=1/4, then 10p , the

probability that A will be ruined, is almost 0.00002.

Definition (Partition) Let { }1 2, , , nB B B… be a set of nonempty subsets of the sample space S of an

experiment. If the events 1 2, , , nB B B… are mutually exclusive and 1n

iiB S

=∪ = , the set

{ }1 2, , , nB B B… is called a partition of S

Note : For any event ,A A and cA both nonempty, the set { }, cA A is a partition.



Theorem 3.4 (Law of Total Probability) If { }1 2, , , nB B B… is a partition of the sample space of an experiment and ( ) 0iP B > for

1,2, ,i n= … , then for any event A of S we have

( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( )

1 1 2 2

1

| | |

|

n n

n

i ii

P A P A B P B P A B P B P A B P B

P A B P B=

= + + +

=∑

More generally, let { }1 2, ,B B … be a sequence of mutually exclusive events of S such

that 1 iiB S

∞

=∪ = . Suppose that, for all 1i ≥ , ( ) 0iP B > . Then for any event A of S,

( ) ( ) ( )1

| i ii

P A P A B P B∞

=

=∑

Example 3.17 An urn contains 10 white and 12 red chips. Two chips are drawn at random and, without looking at their colors, are discarded. What is the probability that a third chip drawn is red? Answer: 12/22

Solution: For 1i ≥ , let iR be the event that the i-th chip drawn is red and iW be the event

that it is white. Intuitively, it should be clear that the two discarded chips provide no information, so 3( ) 12 / 22P R = , the same as if it were the first chip drawn from the urn. To prove this mathematically, note that { }2 1 2 1 2 1 2 1, , ,R W W R R R W W is a partition of the sample space; therefore,

( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )

3 3 2 1 2 1 3 2 1 2 1

3 2 1 2 1 3 2 1 2 1

| |

| | . 3.9

P R P R R W P R W P R W R P W R

P R R R P R R P R W W P W W

= +

+ +

Now

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

2 1 2 1 1

2 1 2 1 1

2 1 2 1 1

2 1 2 1 1

12 10 20 | ,21 22 7710 12 20 | ,21 22 77

11 12 22 | ,21 22 77

9 10 15 | .21 22 77

P R W P R W P W

P W R P W R P R

P R R P R R P R

P W W P W W P W

= = × =

= = × =

= = × =

= = × =

Substituting these values in (3.9), we get

( )311 20 11 20 10 22 12 15 12 .20 77 20 77 20 77 20 77 22

P R = × + × + × + × =

3.4. Bayes’ Formula



Theorem3.5 (Bayes’ Theorem)

Let { }1 2, , , nB B B… be a partition of the sample space S of an experiment. If for 1,2, ,i n= … , ( ) 0iP B > , then for any event A of S with ( ) 0P A > ,

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 1 2 2|

|| | |

k kk

n n

P A B P BP B A

P A B P B P A B P B P A B P B=

+ + +

(Note that ( ) ( )( )| kk

P B AP B A

P A= .)

In statistical applications of Bayes’ theorem, 1 2, , , nB B B… are called hypotheses,

( )iP B is called the prior probability of iB , and the conditional probability ( )|iP B A is called the posterior probability of iB after the occurrence of A.

Simplest forms of Bayes’ formula:

For any event B of ,S B and cB both nonempty, the set { }, cB B is a partition of S . Thus, by Theorem 3.6, if ( ) 0P B > and ( ) 0cP B > , then for any event A of S with ( ) 0P A > ,

( ) ( ) ( )( ) ( ) ( ) ( )

||

| | c cP A B P B

P B AP A B P B P A B P B

=+

and

( ) ( ) ( )( ) ( ) ( ) ( )|

|| |

c cc

c c

P A B P BP B A

P A B P B P A B P B=

+

⇒ calculation of the posterior probability ( )|P B A by using the prior probability ( )|P A B .

Note1: The typical situation is that A happens logically or temporally after B , so that the ( )P B and ( )|P A B can be readily computed. Then we want to know the posterior probability ( )|P B A .

Note2: In practice, Bayes’ formula is used when we know that effect of a cause (hypothesis) and we wish to mark some inference about the cause (hypothesis).

Example 3.18 In a study conducted three years ago, 82% of the people in a randomly selected sample were found to have “good” financial credit ratings, while the remaining 18% were found to have “bad” financial credit ratings. Current records of the people from that sample show that 30% of those with bad credit ratings have since improved their ratings to good, while 15% of those with good credit ratings have since changed to having a



bad rating. What percentage of people with good credit ratings now had bad ratings three years ago?

Answer: 0.072

Example 3.21 A box contains seven red and 13 blue balls. Two balls are selected at random and are discarded without their colors being seen. If a third ball is drawn randomly and observed to be red, what is the probability that both of the discarded balls were blue? Answer: 0.46

Solution: Let BB, BR, and RR be the events that the discarded balls are blue and blue, blue and red, red and red, respectively. Also, let R be the event that the third ball drawn is red. Since{ } , , BB BR RR Is a partition of sample space, Bayes’ formula can be used to calculate ( ) | P BB R .

( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )

( ) ( )

( )

| | .

| | |

13 7 39 7 6 21, ,20 19 95 20 19 190

13 7 7 13 91 ,20 19 20 19 190

P R RB P BBP BB R

P R BB P BB P R BR P BR P R RR P RRNow

P BB P RR

and

P BR

=+ +

= × = = × =

= × + × =

where the last equation follows since B R is the union of two disjoint events: namely ,the first ball discarded was blue, the second was red, and vice versa.

Thus

( )7 39

18 95 | 0.46.7 39 6 91 5 2118 95 18 190 18 190

P BB R×

= ≈× + × + ×

This result can be found from the tree diagram of Figure 3.6, on page 105, as well. Alternatively, reducing sample space, given that the third ball was red, there are 13 blue and six red balls which could have been discarded. Thus

( ) 13 12P BB | R = 0.46.19 18

× ≈



3.5. Independence

Independence (statistically independent) If A is independent of B , knowledge regarding the occurrence of event B does not change the chance of the occurrence of event A . That is,

( ) ( )|P A B P A=

( ) ( ) ( )P AB P B P A∴ = ⇒ ( ) ( ) ( )P AB P A P B=

Or

( ) ( )|P B A P B= ( ) ( ) ( )P BA P A P B⇒ = ⇒ ( ) ( ) ( )P BA P A P B=

If A is independent of B , then B is independent of A .

Definition

Two event A and B are called independent if

( ) ( ) ( )P AB P A P B=

If two events are not independent, they are called dependent. If A and B are independent, we say that { },A B is an independent set of events. Note that by this definition any event A with ( ) 0 1P A or= is independent of every event B

Example 3.22 In the experiment of tossing a fair coin twice, let A and B be the events of getting heads on the first and second tosses, respectively. Intuitively, it is clear that A and B



are independent. To prove this mathematically, note that P(A) = 1/2 and P(B) =1/2. But since the sample space of this experiment consists of the four equally probable events: HH, HT, TH, and TT, we have P(AB) = P(HH) =1/4. Hence P(AB) = P(A)P(B) is valid, implying the independence of A and B. It is interesting to know that Jean Le Rond d’ Alembert (1717-1783), a French mathematician, had argued that, since in the experiment of tossing a fair coin twice, the possible number of heads is 0, 1, and 2, the probability of no heads, and two heads, each is 1/3

Example3.23 In the experiment of drawing a card from an ordinary deck of 52 cards, let A and B be the events of getting a heart and an ace, respectively. Whether A and B are independent cannot be answered easily on the basis of intuition alone. However, using the defining formula, P(AB) =P(A)P(B), this can be answered at once since P(AB) =1/52, P(A) = 1/4, P(B) = 1/13 ,and 1/52 = 1/4 x 1/13. Hence A and B are independent events.

Example 3.24 An urn contains five red and seven blue balls. Suppose that two balls are selected at random and with replacement. Let A and B be the events that the first and the second balls are red, respectively. Then, using the counting principle we get P(AB) = (5x5)/(12x12). Now P(AB) =P(A)P(B) since P(A) = 5/12 and P(B) = 5/12. Thus A and B are independent. If we do the same experiment without replacement, then ( )P B| A =4/11 while

( ) ( ) ( ) ( ) ( )C C| A A + | A A4 5 5 7 5 = + = ,

11 12 11 12 12

P B P B P P B P=

× ×

which might be quite surprising to some. But it is true. If no information is giving on the outcome of the first draw, there is no reason for the probability of second ball being red to differ from5/12. Thus ( ) ( )| AP B P B≠ , implying that A and B are dependent.

Theorem 3.6 If A and B are independent, then A and cB are independent as well.

Proof: ( ) ( ) ( )cP A P AB P AB= + ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )1c cP AB P A P AB P A P A P B P A P B P A P B∴ = − = − = − =⎡ ⎤⎣ ⎦

Corollary If A and B are independent, then cA and cB are independent as well.

Proof: A and B are independent A⇒ and cB are independent cB⇒ and A are independent cB⇒ and cA are independent.

Remark

If A and B are independent, knowledge about the occurrence or nonoccurrence



of A does not change the chances of the occurrence or nonoccurrence of B . If A and B are mutually exclusive events and ( ) ( )0, 0P A P B> > , then they

are dependent. This is because if we are given that one has occurred, the chance of the occurrence of the other one is 0.

For example, let A be the event that the next president of the United States is a Democrat and B be the event that he or she is a Republican. Then A and B are mutually exclusive; hence they are dependent. If A occurs, that is, if the next president is a Democrat, the probability that B occurs, that is, he or she is a Republican is zero, and vice versa.

Example 3.27 Dennis arrives at his office every day at a random time between 8:00 A.M. and 9:00 A.M. Let A be the event that John arrives at his office tomorrow between 8:15 A.M. and 8:45 A.M. Let B be the event that John arrives between 8:30 A.M. and 9:00 A.M., and let C be the event that John arrives either between 8:15 A.M. and 8:30 A.M. or between 8:45 A.M. and 9:00 A.M. Then , , , AB AC BC and B C∪ are events that John arrives at his office between 8:30 and 8:45, 8:15 and 8:30, 8:45 and 9:00, and 8:15 and 9:00, respectively. Thus ( ) ( ) ( ) 1 2P A P B P C= = = and ( ) ( ) 1 4P AB P AC= = . So ( ) ( ) ( )P AB P A P B= and ( ) ( ) ( )P AC P A P C= ; that is, A is independent of B and it is

independent of C . However, since BC and A are mutually exclusive, they are dependent. Also, ( ) ( )| 2 3P A B C P A∪ = ≠ . Thus B C∪ and A are dependent as well.

Definition (Extended Concept of Independence)

The event A , B and C called independent if

( ) ( ) ( )P AB P A P B=

( ) ( ) ( )P AC P A P C=

( ) ( ) ( )P BC P B P C=

( ) ( ) ( ) ( )P ABC P A P B P C=

If A , B and C are independent, we say that { }, ,A B C is an independent set of events.

Example 3.29 Let an experiment consist of throwing a die twice. Let A be the event that in the second throw the die lands 1, 2, or 5; B the event that in the second throw it lands 4, 5 or 6; and C the event that the sum of the two outcomes is 9.

Then ( ) ( ) ( )= B 1/2, C =1/9P A P P= , and

( ) ( ) ( )1 1 ,6 4

P AB P A P B= ≠ =



( ) ( ) ( )1 1 ,36 18

P AC P A P C= ≠ =

( ) ( ) ( )1 1 ,6 4

P AB P A P B= ≠ =

( ) ( ) ( )1 1 ,36 18

P AC P A P C= ≠ =

while ( ) ( ) ( ) ( )136

P ABC P A P B P C= =

Thus the validity of ( ) ( ) ( ) ( )P ABC P A P B P C= is not sufficient for the independence of , , .A B and C

Definition (Generalized Concept of Independence)

The set of event { }1 2, , , nA A A… is called independent if for every subset { }1 2, , , ki i iA A A… , 2k ≥ , of { }1 2, , , nA A A… ,

( )1 2 1 2, , , ( ) ( ) ( )k ki i i i i iP A A A P A P A P A=… By definition, if for all combination 1 i j k n≤ < < < ≤ , then

( ) ( ) ( )( ) ( ) ( ) ( )

( ) ( ) ( ) ( )1 2 1 2

i j i j

i j k i j k

n n

P A A P A P A

P A A A P A P A P A

P A A A P A P A P A

=

=

=

The number of equations must be satisfied

( )1 1 2 12 3 0 1

n nn n n n n nn

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ + = + − − = − −⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

Another view: The number of subsets of { }1 2, , , nA A A… is 2n , including 0 and n subset with only one event ⇒ The number of equations for n independent events must be satisfied is 2 1n n− −

Example 3.32 Figure 3.8 shows an electric circuit in which each of the switches located at 1, 2, 3, and 4 is independently closed or open with probabilities p and 1 p− , respectively. If a signal is fed to the input, what is the probability that it is transmitted to the output? Answer: 2 2(2 )p p−

Solution: Let Ei be the event that the switch at location i is closed, 1 4i≤ ≤ . A signal fed to the input will be transmitted to the output if at least one of the events E1E2 and E3E4 occurs. Hence the desired probability is

( ) ( ) ( ) ( )( )

1 2 3 4 1 2 3 4 1 2 3 4

2 2 4 2 2

-

2 .

P E E E E P E E P E E P E E E E

P P P P P

∪ = +

= + − = −

Probability Theory. Chapter 4: Distribution Function and Discrete Random Variables 1/17


Chapter 4. Distribution Function and Discrete Random Variables

4.1. Random Variables

Quantities that do not have fixed values:

The values of such quantities depend on random actions, and they usually change from one experiment to another.

For example, the number of babies born in a certain hospital each day, the arrival time of a bus at a station, the sum of the outcomes of two dice when thrown, the amount of rainfall in Seattle during a given year, the number of earthquakes that occur in California per month, the weight of grains of wheat grown on a certain plot of ground, etc.

In probability, the quantities introduced in these diverse examples are called Random variables of which the numerical values are unknown.

The numerical (experimental) values depend on random elements occurring at the time of the experiment, and over which we have no control.

Example In rolling two fair dice, if X is the sum, then { }2,3, ,12X ∈ , we should have

12

2( ) 1

iP X i

== =∑

( ){ }( ) ( ){ }

( 2) 1,1 1 36,

( 3) 1,2 , 2,1 2 36

p X P

P X P

= = =

= = =

( ) ( ) ( ){ }( 4) 1,3 , 2, 2 , 3,1 3 36P X P= = = ,

Definition Let S be the sample space of an experiment. A real-valued function :X S →ℜ is

called a random variable of the experiment if, for each interval B ⊆ℜ , { }: ( )s X s B∈ is an event.

The set { }: ( )s X s B∈ is often abbreviated as { }X B∈ , or simply as X B∈ .



Example 4.1 Suppose that three cards are drawn from an ordinary deck of 52 cards, one-by-one, at random and with replacement. Let X be the number of spades drawn; then X is random variable. If an outcome of spades is denoted by s, and other outcomes are represented by t, then X is a real-valued function defined on the sample space

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ){ }, , , , , , , , , , , , , , , , , , , , , , , ,S s s s t s s s t s s s t s t t t s t t t s t t t= by ( , , ) 3, ( , , ) 2, ( , , ) 2, ( , , ) 1X s s s X s t s X s s t X s t t= = = = , and so on. Now we must determine the values that X assumes and the probabilities that are associated with them. Clearly, X can take the values 0, 1, 2, and 3. The probabilities associated with these values are calculated as follows:

( ) ( ){ }( )( ) ( ) ( ) ( ){ }( )

( ) ( ) ( ) ( ){ }( )

( ) ( ){ }( )

3 3 3 270 ,4 4 4 64

1

1 1 3 3 1 3 3 3 1 27 ,4 4 4 4 4 4 4 4 4 64

2

1 1 3 1 3 1 3 1 1 9 ,4 4 4 4 4 4 4 4 4 64

11

, ,

, , , , , , , ,

, , , , , , , ,

,4

, .6

P X P

P X

P X

t t t

P s t t t s t t t s

P s s t s t s t s

X P sP

s

s s

=

⎛ ⎞ ⎛ ⎞ ⎛ ⎞= + + =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠

=

⎛ ⎞ ⎛ ⎞ ⎛ ⎞= + + =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠

= = = × × =

= =

× × × × × ×

= =

× × × × × ×

= = = =

If the cards are drawn without replacement, the probabilities associated with the values 0, 1, 2, and 3 are:

( ) ( ) ( ) ( )

( )

39 13 39 13 39 133 1 2 2 1 3

0 , 1 , 2 , 3 .52 52 52 523 3 3 3

Therefore,13 39

3, 0, 1, 2, 3.

523

P X P X P X P X

i iP X i i

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠= = = = = = = =⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠= = =

⎛ ⎞⎜ ⎟⎝ ⎠



Example 4.2 A bus stops at a station every day at some random time between 11:00 A.M. and 11:30 A.M. If X is the actual arrival time of the bus, X is a random variable. It is a

function defined on the sample space ( )1:11 11 by .2

S t t X t t⎧ ⎫= < < =⎨ ⎬⎩ ⎭

As we know from

Section1.7, P(X=t)=0 for any t S∈ , and ( ){ } ( ), 2111 112

P X β αα β β α−∈ = = −−

for any

subinterval ( ) 1, of 11,112

α β ⎛ ⎞⎜ ⎟⎝ ⎠

.

Example 4.3 In the U.S., the number of twin births is approximately 1 in 90. Let X be the number of births in a certain hospital until the first twins are born. X is a random variable. Denote twin births by T and single births by N. Then X is a real-valued function defined on the sample space { } ( )

1

, , , , by .i

S T NT NNT NNNT X NNN NT i−

= =… … The set of possible

values of X is {1,2,3, …} and

( )1

1

89 1 .90 90

i

i

P X i P NNN N T−

−

⎛ ⎞ ⎛ ⎞ ⎛ ⎞= = =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

…

Example 4.5 The diameter of a flat metal disk manufactured by a factory is a random between 4 and 4.5. What is the probability that the area of such a flat disk chosen at random is at least 4.41π ? Answer: 3/5

Solution: Let D be the diameter of the metal disk selected at random. D is a random variable, and the area of the metal disk, ( )2/ 2Dπ , which is a function of D , is also a random variable. We are interested in the probability of the event 2 / 4 4.41Dπ π> ,

which is

( ) ( )2

24.41 17.64 4.2 .4DP P D P Dπ π

⎛ ⎞> = > = >⎜ ⎟

⎝ ⎠

Now to calculate ( )4.2 ,P D > note that, since the length of D is a random variable in the interval (4, 4.5) , the probability that it falls into the subinterval (4.2, 4.5) is (4.5-4.2)/(4.5-4)=3/5. Hence ( )2 / 4 4.41 3 / 5.P Dπ π> =

Example 4.6 A random number is selected from the interval ( )0, 2π .What is the probability that its sine is greater than its cosine? Answer: 1/2

Solution: Let the number selected be X ; then X is a random variable and therefore sin X and cos X , which are functions of X , are also random variables. We are interested in the probability of the event sin cos :X X>



( ) ( ) 12 4tan 1 ,4 2

sin 0

2

cos P P X P XX X

π ππ

π

−⎛ ⎞= > = > = =⎜⎝ ⎠ −

> ⎟

where the first equality holds since in the interval ( )0, / 2 ,cos 0,Xπ > and the second equality holds since, in this interval, tan X is strictly increasing.

4.2. Distribution Functions

Usually, when dealing with a random variable X for constant a and b , computation of the probabilities ( )P X a= , ( )P X a< , ( )P a X b< ≤ etc. is our ultimate goal.

Since the real-valued function ( )P X t≤ characterizes X , it tells us almost everything about X .

Definition

If X is a random variable, then the function F defined on ( ),−∞ ∞ by ( ) ( )F t P X t= ≤ is called the distribution function of X .

F accumulates all of the probability of the values of X up to and including t . Sometimes it is called the cumulative distribution function ( cdf ) of X .

Properties

F is non-decreasing : if t u< , then ( ) ( )F t F u≤ . proof :

The occurrence of event { }X t≤ implies the occurrence of event { }X u≤

{ } { } { } { } ( ) ( )X t X u P X t P X u F t F u⇒ ≤ ⊆ ≤ ⇒ ≤ ≤ ≤ ⇒ ≤ lim ( ) 1

tF t

→∞=

proof :

This is to prove for any increasing sequence { }nt of real numbers that converges to ∞ , lim ( ) 1nt F t→∞ =

The events { }nX t≤ form an increasing sequence that converges to the event { } { }1 = X



lim ( ) 0t

F t→ −∞

= F is right continuous: for every t∀ ∈ℜ , ( ) ( )F t F t+ = . This means that if nt is a

decreasing sequence of real numbers converging to t , Then lim ( ) ( )nn F t F t→∞ = . proof :

The events { }nX t≤ form a decreasing sequence that converges to the event

{ }1

.nn

X t∞

=

≤∩

{ } { }1

nn

X t X t∞

=

≤ = ≤∩

{ } { } { } { }lim limn n n nX t X t P X t P X t→∞ →∞⇒ ≤ = ≤ ⇒ ≤ = ≤ lim ( ) ( )n nF t F t→∞⇒ =

Some Probability Questions:

( )P X a> ( ) ( )1 1 ( )P X a P X a F a> = − ≤ = −

( ) ,P a X b b a< ≤ > ( ) ( ) ( ) ( ) ( )P a X b P X b P X a F b F a⇒ < ≤ = ≤ − ≤ = −

( ) ( )P X a F a< = − { 1 }X a n≤ − is an increasing sequence that converge to { }X a<

{ } { }1 1x X a n X a∞

=≤ − =



Example 4.7 The distribution function of a random variable X is given by 0 0

0 141 1 2( ) 2

1 1 2 312 2

1 3

xx x

xF x

x x

x



Example 4.8 For the experiment of flipping a fair coin twice, let X be the number of tails and calculate ( )F t , the distribution function of X . And then sketch its graph.

Solution: Since X assumes only the values 0,1 and 2, we have ( ) ( ) 0, if 0F t P X t t= ≤ = < .

If 0 1t≤ < , then ( ) ( ) ( ) { }( ) 10 .4

F t P X t P X P HH= ≤ = = = =

( ) ( ) ( ) { }( )( )

( )

If 1 2, then30 or 1 , , ,4

and if 2, then 1. Hence

0 01/ 4 0 1

.3 / 4 1 2

1 2

t

F t P X t P X X P HH HT TH

t P X t

tt

F tt

t

≤ <

= ≤ = = = = =

≥ ≤ =



Example 4.9 Suppose that a bus arrives at a station every day between 10:00 A.M and 10:30 A:M ,at random. Let X be the arrival time; find the distribution function of X and sketch its graph. Answer:

0 101( ) 2( 10) 10 10 .2

11 102

t

F t t t

t

⎧⎪ <⎪⎪= − ≤



If 1, t x< then ( ) 0.F t =

If 1 2x t x≤ < , then 1 1( ) ( ) ( ) ( )F t P X t P X x p x= ≤ = = = .

If 2 3x t x≤ < , then 1 2 1 2( ) ( ) ( or ) ( ) ( ).F t P X t P X x X x p x p x= ≤ = = = = +

In general, if 1n nx t x− ≤ < , then 1

1( ) ( )

n

ii

F t p x−

=

= ∑ .

F is constant in 1n nx t x− ≤ < , with jumps at { }1 2 3, , ,x x x . The magnitude of the jump at ix is ( )ip x .

Example 4.11 In the experiment of rolling a balanced die twice, let X be the maximum of the two numbers obtained. Determine and sketch the probability function and the distribution of X .

Solution: the possible values of X are 1, 2, 3, 4, 5, and 6. The sample space of this experiment consists of 36 equally likely outcomes. Hence the probability of any of them is 1/36. Thus

( ){ }( )

( )( )( ){ }( )

( )( )( )( )( ){ }( )

3(1) ( 1) 1,1 ,36

3(2) ( 2) 1, 2 2, 2 2,1 ,36

5(3) ( 3) 1, 3 2, 3 3, 3 3, 2 3,1 .36

P P X P

P P X P

P P X P

= = = =

= = = =

= = = =

Similarly, ( ) ( )4 7 / 36, 5 9 / 36,p p= = and ( ) ( ) { }6 11/ 36; 0, for 1, 2, 3, 4, 5, 6 .p p x x= = ∉ The graphical representation of p is as follows:

( )

0 11/ 36 1 24 / 36 2 39 / 36 3 4 .

16 / 36 4 525 / 36 5 61 6

xxx

F x xxx

x



4.4. Expectations of Discrete Random Variables

Expected Value If X is the random variable denoting the gain in one play, then the parameter

( )E X is the expected (average) value of X . If we play the game n times and find the average of the values of X , then as

n →∞ , ( )E X is obtained.

Fair Game If for some game ( ) 0E X = , then in long run the player neither loses nor wins. Such games are called fair.

Definition

The expected value of a discrete random variable X with the set of possible values A and probability function ( )p X is defined by ( ) ( )

x A

E X xp x∈

=∑ .

( )E X exists if this sum converges absolutely.

Note: ( )E X is also called the mean or the mathematical expectation, or the

expectation of X . It is also denoted by [ ]E X , EX , Xμ ,or μ .

Other Views If each value x of X is weighted by ( ) ( )p x P X x= = , then ( )

x Axp x

∈∑ is nothing but



the weighted average of X .

If we think of a unit mass distributed along the real line at the points of A so that the mass at x A∈ is ( )P X x= , then ( )E X is the center of gravity.

Example 4.14 We flip a fair coin twice and let X be the number of heads obtained. What is the expected value of X ? Answer: ( )E X =1

Example 4.15 We write the number 1 2, , na a a on n identical balls and mix them in a box, What is the expected value of a ball selected at random? Answer: ( ) 1 2( ) /nE X a a a n= + + +

Example 4.20 The tanks of a country’s army are numbered 1 to N. In a war this country loses n random tanks to the enemy, who discovers that the captured tanks are numbered. If

1 2, , , nX X X are the numbers of the captured tanks, what is (max )iE X ? How can the enemy use (max )iE X to find an estimate of N , the total number of this country’s tanks?

Solution: Let max iY X= ; then

( )

11

, 1, 2, , ,

kn

P Y k for k n n n NNn

−⎛ ⎞⎜ ⎟−⎝ ⎠= = = + +⎛ ⎞⎜ ⎟⎝ ⎠

…

Because if the maximum of 's isiX k , the numbers of the remaining 1n − tanks are from 1 to 1k − . Now

( ) ( )

( )( ) ( ) ( )

11

1 !1 !1 ! ! ! !

1 .

N N

k n k n

N N

k n k n

N

k n

kk

nE Y kP Y k

Nn

k k n kN Nn k n n k nn n

kN nn

== ==

== ==

==

−⎛ ⎞⎜ ⎟−⎝ ⎠= = =⎛ ⎞⎜ ⎟⎝ ⎠

−= =

− − −⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

⎛ ⎞= ⎜ ⎟⎛ ⎞ ⎝ ⎠⎜ ⎟⎝ ⎠

∑ ∑

∑ ∑

∑

(4.2)

To calculate ,N

k n

kn==⎛ ⎞⎜ ⎟⎝ ⎠

∑ note that kn⎛ ⎞⎜ ⎟⎝ ⎠

is the coefficient of nX in the polynomial ( )1 .kx+

Therefore, N

k n

kn==⎛ ⎞⎜ ⎟⎝ ⎠

∑ is the coefficient of nX in the polynomial ( )1 .N

k

k nx

==

+∑ Since



( ) ( ) ( ) ( ) ( )( )

( ) ( )

1

1

1 11 1 1 1

1 11 1 1 ,

N nN Nk n k n

k n k n

N n

xx x x x

x

X XX

− +

== ==

+

+ −+ = + + = +

+ −

⎡ ⎤= + − +⎣ ⎦

∑ ∑

and the coefficient of nx in the polynomial ( ) ( )11 1 1N nx xx

+⎡ ⎤+ − +⎣ ⎦ is 1

,1

Nn+⎛ ⎞

⎜ ⎟+⎝ ⎠ we have

1.

1

N

k n

k Nn n==

+⎛ ⎞ ⎛ ⎞=⎜ ⎟ ⎜ ⎟+⎝ ⎠ ⎝ ⎠

∑

Substituting this result in (4.2), we obtain

( )

( )( ) ( )

( )

( )1 !1

1 ! ! 11.! 1

! !

NNnn

n N n n NnE Y NN n

n N nn

++⎛ ⎞⎜ ⎟ + − ++⎝ ⎠= = =

+⎛ ⎞⎜ ⎟ −⎝ ⎠

To estimate N, the total number of this country’s tanks, we solve ( ) ( )11

n NE Y

n+

=+

for .N

We obtain ( )1 1.nN E Yn+

= − Therefore, if, for example, the enemy captures 12 tanks and

the maximum of the numbers of the tanks captured is 117, then assuming that ( )E Y is approximately equal to the value Y observed, we get ( )13/12 117 1 126 .N ≈ × − ≈

Theorem 4.1

If X is a constant random variable, that is, if ( ) 1P X c= = for a constant c , Then ( ) .E X c=

proof : ( ) 0 ( ) . ( ) .1P X c E x c P X c c c≠ = ⇒ = = = =

Theorem4.2 (Law of Unconscious Statistician)

Let X be a discrete random variable with set of possible values A and probability function ( )p x , and let g be a real-valued function. Then ( )g x is a random variable with ( ) ( ) ( )

x AE g x g x p x

∈

=⎡ ⎤⎣ ⎦ ∑

Note that ( )g x is random variable with the possible set of values ( )g A

Corollary

Let X be a discrete random variable; 1 2, , , nf f f be real-valued functions, and let

1 2, , , na a a be real numbers. Then

[ ] ( ) [ ] ( )1 1 2 2 1 1 2 2( ) ( ) ( ) ( )n n n nE f X f X f X E f X E f X E f Xα α α α α α+ + + = + + +⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦



Examples;

(1) ( ) ( ) ( ) ( )3 2 3 22 5 7 4 2 5 7 4E X X X E X E X E X+ + + = + + +

(2) ( 2sin log ) ( ) 2 (sin ) (log )X XE e X X E e E X E X+ + = + +

(3) ( )E X is linear: ( ) ( )E X E Xα β α β+ = +

Example 4.23 The probability function of discrete random variable X is given by

1, 2,3, 4,515( )0 .

x xp x

otherwise

⎧ =⎪= ⎨⎪⎩

, What is the expected value of (6 )X X− ? Answer: 7.

4.5. Variances and Moments of Discrete Random Variables

Additional measures for decision-marking are needed: Let Y be the true value (the average of a large number of measurement, ( )E X )

of the quantity minus the value obtained by measurement X . Then Y is the error of measurement.

Y is a random variable with expected value zero. The reason is that in measuring a quantity a very large number of times, positive and negative errors of the same magnitudes occur with equal probabilities.

( )( ) ( ) ( ) ( ) 0E Y E X E X E X E X E Xμ= − = − = − =⎡ ⎤⎣ ⎦ Consider an experiment in which a quantity is measured several times and the

average of the errors is obtained to be a number close to zero. Can we conclude that the measurements are very close to the true value and thus are accurate? The answer is no because they might differ from the true value by relatively large quantities but be scattered both in positive and negative directions.

Expectation by itself does not give adequate information.

Variance of Random Variable Variance measures the average magnitude of the fluctuations of a random variance

from its expectation. ( )[ ]E X E X− is not an appropriate measure for the variance, If we consider

( )( )E X E X− is difficult to handle; 2( ( ))E X E X⎡ ⎤−⎣ ⎦ analogous to Euclidean distance in geometry, is used instead and is called the variance of X .

The square root of 2( ( ))E X E X⎡ ⎤−⎣ ⎦ is called the standard deviation of X .



Definition

Let X be a discrete random variable with set of possible values A , probability function ( )p x , and ( )E X u= , Then Xσ and ( )Var X , called the standard deviation

and the variance of X , respectively, are defined by 2( )X E Xσ μ⎡ ⎤= −⎣ ⎦ and 2 2( ) ( )x Var X E Xσ μ⎡ ⎤= = −⎣ ⎦ .

By this definition and Theorem 4.2: ( ) ( )22( ) ( )x A

Var X E X X p xμ μ∈

⎡ ⎤= − = −⎣ ⎦ ∑

Note:

Let X be a discrete random variable with the set of possible values A and probability function ( )p x . Suppose the value t is predicted for X , then the error

is X t− . We would like to minimize ( )2E X t⎡ ⎤−⎣ ⎦

( ) ( )

( ) ( )

2 2

2 2

( )

( ) 0 ( ) ( )

x A

x A x A x A

E X t x t p x

d dE X t x t p x xp x t p x tdt dt

∈

∈ ∈ ∈

⎡ ⎤− = −⎣ ⎦

⎡ ⎤− = − = ⇒ = =⎣ ⎦

∑

∑ ∑ ∑

( )2E X t⎡ ⎤⇒ −⎣ ⎦ is minimized for [ ]( ) ,x A

t xp x E X∈

= =∑ and the minima is

( )2( ) ( )E X E X Var X⎡ ⎤− =⎣ ⎦ . So ( )2( ) min

tVar X E X t⎡ ⎤= −⎣ ⎦

The better t predicts X ( optimal [ ]t E X= ), the smaller ( )Var X is.

Theorem 4.3: ( ) 22( ) ( )Var X E X E X= − ⎡ ⎤⎣ ⎦ .

:proof

( ) ( ) ( ) ( )( ) ( ) ( )

2 2 2 2 2 2

22 2 2

( ) 2 2Var X E X E X E X E X

E X E X E X

μ μ μ μ μ

μ

⎡ ⎤= − = − + = − +⎣ ⎦

= − = − ⎡ ⎤⎣ ⎦

Note1: Since ( ) 0Var X ≥ , for any discrete random variable X , ( ) ( )2 2E X E X⎡ ⎤≤⎡ ⎤⎣ ⎦ ⎣ ⎦ . Note2: The formula is usually a better alternative for computing the variance of X

Note3: In circuit theory, 2( )E X indicates total power, ( ) 2E X⎡ ⎤⎣ ⎦ indicates DC power, 2( ) ( )Var X E X μ⎡ ⎤= −⎣ ⎦ indicates AC power.



Example 4.27 What is the variance of the random variable X , the outcome of rolling a fair die? Answer: Var(X)=35/12

Theorem 4.4

Let X be a discrete random variable with the set of possible values A , and mean μ . Then ( ) 0Var X = if and only if X is a constant with probability 1.

:proof

)⇒

( ) ( )2 2( ) ( ) 0x A

Var X E X x p xμ μ∈

⎡ ⎤= − = − =⎣ ⎦ ∑

with that all the terms of this series are nonnegative. 0 x x Aμ⇒ − = ∀ ∈ Set A consists of only one number: μ

X μ⇒ = with probability 1

)⇐

Conversely, if X is a constant, say μ , then ( )X E X μ= = and ( ) ( )2 0Var X E X μ⎡ ⎤= − =⎣ ⎦ .

Theorem 4.5

Let X be a discrete random variable; then for constants and a b we have that ( ) 2 ( ), aX b XVar aX b a Var X aσ σ++ = = .

:proof

( ) ( ){ } ( ) ( )( ){ }( )( ){ } ( )( )( )( )

22

2 22

22 2

( )

( )

Var aX b E aX b E aX b E aX b aE X b

E a X E X E a X E X

a E X E X a Var X

⎡ ⎤⎡ ⎤+ = + − + = + − +⎢ ⎥⎣ ⎦ ⎣ ⎦⎡ ⎤ ⎡ ⎤= − = −⎢ ⎥ ⎣ ⎦⎣ ⎦⎡ ⎤= − =⎣ ⎦

Taking the square roots of both sides, we find that aX b Xaσ σ+ = .

Example 4.28 Suppose that, for a discrete random variable X , ( ) 2E X = and

( )4 5E X X − =⎡ ⎤⎣ ⎦ . Find the variance and the standard deviation of 4 12X− + .

Answer: ( )Var X =9 and ( )4 12Var X− + =144, 4 12 12Xσ− + = .

Concentration about an arbitrary point

Variance measures the dispersion, or spread, of a distribution about its expectation.

One way to find out which one of the two given random variables X and Y is



less dispersed, or spread, about an arbitrary point, is to see which one is more concentrated about the point (or which one has smaller variance).

Definition Let X and Y be two random variables and ω be a given point. If for all 0,t >

( ) ( ),P Y t P X tω ω− ≤ ≤ − ≤ then we say that X is more concentrated about ω than

is Y .

Theorem 4.6 Suppose that X and Y are two random variables with ( ) ( )E X E Y μ= = . If X is more concentrated about μ than is Y , then ( ) ( )Var X Var Y≤

Moments

The expected value of X is also called the first moment of X . Provided that ( )( ) ( ), E g X E g X< ∞ ⎡ ⎤�

Documents

輔仁大學 96 學年度課程綱要表scholar.fju.edu.tw/課程大綱/upload/049469/content/961... · 2007-09-12 · 輔仁大學 96 學年度課程綱要表 開課系所 名稱及代碼

輔仁大學 96 學年度課程綱要表scholar.fju.edu.tw/課程大綱/upload/049469/content/961... · 2007-09-12 · 輔仁大學 96 學年度課程綱要表開課系所名稱及代碼