Motivation
I’m revisiting my old statistical inference textbook with the aim of slowly working through many of the problems. When I started out, Casella and Berger’s Statistical Inference was my first ever exposure to statistics, and I didn’t have the mathematical preparation for it. Since then, I’ve developed greater mathematical maturity and better general knowledge and intuition about statistics.
I also want to learn how to implement some of the procedures and verify some of the properties in the book using R. So, when relevant, I will include R code addressing some of the exercises.
I’m starting from the beginning. Some of the early exercises are not terribly interesting, but bear with me. We’ll get there in time.
Exercises
We’re considering exercises 1-5, in section 1.7 (Casella and Berger 2002, 37). This chapter is an introduction to probability and other preliminaries, such as basic set theory.
Exercise 1.1
For each of the following experiments, describe the sample space
First, recall the definition of a sample space: The sample space is “The set, \(S\), of all possible outcomes of a particular experiment.” (p. 1).
(a) Toss a coin four times.
The sample space can be represented by listing the possible outcomes: \(\{HHHH\}, \{HHHT\}, \{HHTH\}, \{HTHH\}, \\ \{THHH\}, \{HHTT\}, \{HTHT\}, \{HTTH\}, \\ \{THTH\}, \{TTHH\}, \{THHT\}, \{TTTH\}, \\ \{TTHT\}, \{THTT\}, \{HTTT\}, \{TTTT\}\)
(b) Count the number of insect-damaged leaves on a plant
Since we are only interested in the number, and not in the extent of damage, the sample space is the set of nonnegative integers: \(S=\{0,1,2,3,...\}\). If we knew the total number of leaves (unlikely), we could put an upper bound on the possible outcomes.
(c) Measure the lifetime (in hours) of a particular brand of light bulb
The lifetime of a lightbulb can’t be negative, but there’s no reason the answer needs to be a whole number. There’s also no clear upper bound. So we can say \(S=[0,\infty]\).
(d) Record the weights of 10-day-old rats
This is similar to the lighbulb problem. The weights don’t need to be in whole numbers. While there is no need to impose an upper bound, it is easier to find one here than it was in the lighbulb problem. We would be very surprised to find a 10-day-old rat weighing more than, say, 31 oz. So, measuring in ounces, we can say \(S=[0,32]\) (though any upper bound will do)
(e) Observe the proprtion of defectives in a shipment of electronic components.
The sample space here will take on rational number values. Suppose there are \(N\) total components. Then the possible values for the proportion of defectives are \(S=\left\{\frac{1}{N}, \frac{2}{N},...,\frac{N}{N}\right\}\).
Exercise 1.2
Verify the Following Identities
(a)
\(A \setminus B = A \setminus (A\cup B)=A\cap B^c\)
Proof (i = iii):
Note that, to prove set equality, we prove (i) is a subset of (iii) and (iii) is a subset of (i).
\[\begin{align*} x \in A \setminus B &\Rightarrow x\in A \text{ and } x \notin B \\ &\Rightarrow x \in A \text{ and } x \in B^c \\ &\Rightarrow x \in A \cap B^c \\ &\Rightarrow A \setminus B \subseteq A \cap B^c \\ \\ x \in A\cap B^c &\Rightarrow x \in A \text{ and } x \in B^c \\ &\Rightarrow x \in A \text{ and } x \notin B \\ &\Rightarrow x \in A \setminus B \\ &\Rightarrow A \cap B^c \subseteq A \setminus B \\ &&\blacksquare \end{align*}\]
The proofs of the other identities follow from this because:
\[\begin{align*} A \setminus (A \cap B) &= A\cap (A \cap B)^c \\ &=A \cap (A^c \cup B^c) && \text{(DeMorgan's Laws)} \\ &=(A \cap A^c) \cup (A \cap B^c) && \text{(Distributive property)} \\ &= \emptyset \cup (A \cap B^c) \\ &= A\cap B^c = A \setminus B \\ && \blacksquare \end{align*} \]
We can verify some of this in R. We use the setdiff()
and intersect()
commands. We define a sample space S
to contain A
and B
, so we can define \(B^c\) as \(S\setminus B\).
# The Sample Space
S=1:20
# Subsets A and B, corresponding to problem
A=1:10
B=6:10
B_complement = setdiff(S,B)
A_complement = setdiff(S,A)
# Set Difference
setdiff(A,B)
## [1] 1 2 3 4 5
# Set difference: A and A intersection B
setdiff(A, intersect(A,B))
## [1] 1 2 3 4 5
# A intersection B complement
intersect(A,B_complement)
## [1] 1 2 3 4 5
As you can see, we obtain the same answer from all three versions and, at least in this case, we have verified the identity.
(b)
\(B=(B \cap A) \cup (B \cap A^c)\)
Proof
Let \(S\) represent the sample space
\[ \begin{align*} B&=(B \cap A) \cup (B \cap A^c) \\ &= B \cap (A \cup A^c) && \text{(Set distribution)} \\ &= B \cap S && (A \cup A^c =S) \\ &= B && (B \cap S = B) \\ \\ && \blacksquare \end{align*} \]
We again verify in R, with A, B, and S as defined above. We add the union()
command to our repertoire.
B
## [1] 6 7 8 9 10
# (B intersect A) U (B intersect A complement)
union(intersect(A,B),intersect(A_complement,B))
## [1] 6 7 8 9 10
Again, as expected, the identity holds in practice. We will dispense with the R examples in (c) and (d) as they are very similar.
(c)
$ B A = B A^c$
This was proven in part (a).
(d)
\(A \cup B = A \cup (B \cap A^c)\). The proof is evident if we note, as in (b), that \(A \cup A^c = S\). Then \(A \cup (B \cap A^c) = (A \cup B) \cap (A \cup A^c) = (A \cup B) \cap S = A \cup B\).
Exercise 1.3
Show that:
(a) Commutativity holds for intersection and union of sets A and B
\[ \begin{align*} x \in A \cup B &\Rightarrow x \in A \text{ or } x \in B \\ &\Rightarrow x \in B \text{ or } x \in A \\ &\Rightarrow x \in B \cup A \\ && \blacksquare \end{align*} \]
The argument for intersections is the same. We omit part (b) because the argument is essentially the same, with an extra step.
(c) Prove DeMorgan’s Laws for sets A and B.
In this case, we will only prove DeMorgan’s Law for set intersections. The proof for unions is, again, very similar.
\[ \begin{align*} x \in (A \cap B)^c &\Rightarrow x \notin (A \cap B) \\ &\Rightarrow x \notin A \text{ or } x \notin B \\ &\Rightarrow x \in A^c \text{ or } x \in B^c \\ &\Rightarrow x \in A^c \cup B^c \\ &\Rightarrow (A \cap B)^c \subseteq A^c \cup B^c \\ \\ x \in A^c \cup B^c &\Rightarrow x \notin A \text{ or } x \notin B \\ &\Rightarrow x \notin (A \cap B) \\ &\Rightarrow x \in (A \cap B)^c \\ &\Rightarrow A^c \cup B^c \subseteq (A \cap B)^c \\ && \blacksquare \end{align*} \]
Exercise 1.4: Find formulas for the probabilities of the following events in terms of P(A), P(B), and P(A intersection B).
(a) “Either A or B or both”
This is just inclusive or, \(A \cup B\). However, we don’t have the union operation available. However, theorem 1.2.9 (p. 10) states that \(P(A\cup B) = P(A)+P(B)-P(A \cap B)\), which does exclusively use the tools available to us.
(b) “Either A or B but not both”
This is exclusive or: we want the probability of \(A \cap B^c\) or \(B \cap A^c\). In other words, we’re interested in P((A B^c)(B A^c)). Because \((A \cap B^c) \cup (B \cap A^c) = \emptyset\), we have \(P((A \cap B^c) \cup (B \cap A^c)) = P(A \cap B^c)+P(B \cap A^c) \\ = [P(A)-P(A \cap B)]+[P(B)-P(A \cap B)] \\ = P(A)+P(B)-2P(A \cap B)\)
(c) “At least one of A or B”
This is the same as part (a).
(d) “At most one of A or B”
It’s easiest to think about this problem in terms of what it excludes: \(A\cap B\). Everything else is included: Just \(A\), just \(B\), and neither \(A\) nor \(B\). Recalling that the total probability is 1, we can represent the solution as \(1-P(A\cap B)\). This is the probability of the whole sample space minus the intersection of \(A \cap B\).
Exercise 1.5
Approximately one-third of all human twins are identical (one-egg) and two-thirds are fraternal (two-egg) twins. Identical twins are necessarily the same sex, with male and female being equally likely. Among fraternal twins, approximately one-fourth are both female, one-fourth are both male, and half are one male and one female. Finally, among all U.S. births, approximately 1 in 90 is a twin birth. Define the following events:
- A = {a U.S. birth results in twin females}
- B = {a U.S. birth results in identical twins}
- C = {a U.S. birth results in twins}
Part (a)
State, in words, the event \(A \cap B \cap C\)
Event C represents twin females; B represents identical twins; and C represents twins in general. The intersection of these three, then, represents identical female twins.
Part (b)
Find \(P(A\cap B\cap C)\)
We take \(P(C) \times P(B) \times P(A) = \frac{1}{90} \times \frac{1}{3} \times \frac{1}{2} = \frac{1}{540}\).
Concluding Notes
Again, these problems were pretty straightforward and mostly highlighted the basics of sets and probaiblity calculations. The problems will get more sophisticated as we get further along. Stay tuned! I will probably try to do one post like this each week.
Casella, George, and Roger L. Berger. 2002. Statistical Inference. Thomson Learning.