Niemi Consulting

cutting-edge statistical consulting

Entries Comments



Poisson random variable

10 July, 2008 (11:18) | introductory, random variables | By: jarad

Yesterday we discussed the negative binomial random variable and we mentioned that it has an infinite number of values consisting of all natural numbers. Another random variable who values consist of all natural numbers and is used more extensively is the Poisson random variable.

The Poisson random variable is primarily used when you are counting observations and the upper limit of the count is technically infinity. Here are some examples:

  • number of field goals scored in a basketball game
  • number of blades of grass in one square meter
  • number of birds observed in a 10-minute count

The notation we use for a Poisson random variable is:

Y~Po(\lambda)

and we say `Y is a Poisson random variable with mean lambda.’

Unlike the binomial and negative binomial random variables, the Poisson random variable only has one parameter. This parameter gives the expected number for the random variable. But this number does not need to be a natural number. For example, we could have an expected number of 3.5 and this would mean that (approximately) we would expect 3 as often as 4 and 2 as often as 5.

The probability mass function is

f(y)=\lambda^y e^{-\lambda}/y!

where e represents this mathematical constant.

Bottom Line: A Poisson random variable can be used to model counts when there is no maximum number the count can be.

Negative binomial random variable

9 July, 2008 (11:36) | introductory, random variables | By: jarad

Continuing our discussion of random variables, we introduce a random variable called the negative binomial. Although it is not one that I use very often, I thought it was a nice transition from the Bernoulli and binomial random variables which have a finite number of values to the random variables that have an infinite number of values.

The negative binomial is also related to the Bernoulli, but it answers the question: how many negative outcomes occur before a set number of positive outcomes? Here are some examples:

  • number of tails before 3 heads when flipping a coin (3,1/2)
  • number of ones and twos before 5 fours when rolling a six-sided die (5,1/3)
  • number of losses before the MN Twins win 10 baseball games (10,?)

The notation we use for a negative binomial random variable is:

Y~NBin(r,p)

and we say `Y is a negative binomial random variable waiting for r positive outcomes and each trial has probability p.’

Similar to the binomial random variable, this random variable has two parameters. The parameter that is common to both of these random variables is p, the probability of a positive outcome. The new parameter is r, the number of positive outcomes. With the same p, we expect that Y will generally be larger if r is larger. The parameters for the examples given above are provided in the parentheses. For the third example, we do not know what the probability of a success will be and therefore I have put a question mark here.

The probability mass function is

f(y)=(n choose y) y^p(1-p)^{1-y}

The notation used in this pmf is explained in the binomial random variable post. When discussing the binomial random variable, I mentioned that the random variable has a finite number of values it can be, namely integers from 0 up to n. The negative binomial random variable does not have this restriction.

This is clearly demonstrated with the coin flipping example. Ask yourself the question, what is the maximum number of tails you could possibly observe before observing 3 heads? The answer is that there is no maximum. Therefore y can take on integer values from 0 up to infinity.

Bottom Line: A negative binomial random variable can be used to model the number of negative outcomes observed until a fixed number of positive outcomes occur.

Binomial random variables

4 July, 2008 (19:50) | introductory, random variables | By: jarad

Continuing our discussion of random variables, we consider the Binomial random variable which is a straight-forward extension of the Bernoulli random variable. The extension is that we perform the same Bernoulli random variable repeatedly. If each trial is independent, i.e. no trial influences any other, than the Binomial random variable is the total number of positive outcomes in the set of trials.

Here are some examples:

  • number of heads in seven tosses of a coin (7,1/2)
  • number of ones and twos in thirteen rolls of a six-sided die (13,1/3)
  • number of A+ children out of three from the same couple whose blood types are both AO+ (3,3/4)

The notation we use for a binomial random variable is:

Y~Bin(n,p)

and we say `Y is a binomial random variable with a total of n trials and each trial has probability p.’

Compared with the notation for a Bernoulli, we have two parameters for this distribution. The numbers in parentheses of the above examples provide the n and p parameters respectively. The probability mass function is

f(y)=(n choose y) y^p(1-p)^{1-y}

This pmf looks almost exactly the same as the pmf for the Bernoulli random variable, the only noticeable difference is the coefficient in front. This term is known as the binomial coefficient and we say `n choose p’. This term provides the number of ways we can choose p objects from a total of n objects. It can be calculated using the following formula:

(n choose y) = n!/(y!(n-y)!

where the exclamation points are the factorial, i.e. n! = n x (n-1) x … x 2 x 1. The second difference between this pmf and that for the Bernoulli is that y can take on integers from 0 up to n. Bernoulli is actually just a special case of the binomial with n=1.

Bottom Line: A binomial random variable can be used to model the sum of independent Bernoulli random variables.

Bernoulli random variables

2 July, 2008 (10:55) | introductory, random variables | By: jarad

Yesterday we introduced the idea of a random variable. Today we discuss the easiest random variable to understand, the Bernoulli random variables. Basically any time an outcome has one of two possibilities, it can be modeled as a Bernoulli random variable. Some examples include:

  • will the Minnesota Twins win the next World Series
  • will the Democrats win the next presidency
  • will the Koala bear become extinct in the next 10 years
  • will humans set foot on Mars in the next 30 years

Each of these questions have two possible answers which is the key to identifying outcomes that can be modeled as Bernoulli random variables. Of course, the probabilities of the positive outcomes in the above examples are not equal. Therefore we need a parameter that will define how likely the outcome is. In this case, we call the parameter p since it represents the probability of an outcome. If Y is the outcome of one of the examples above then we write

Y~Ber(p)

In English, this notation says `Y is a Bernoulli random variable and the probability that Y is equal to 1 is p and the probability that Y is equal to 0 is 1-p’. For example, if the Twins have a 5% chance of winning the next world series and if Y=1 corresponds to the Twins winning, then we have Y~Ber(0.05).

The last concept we need to introduce in this post is the concept of a probability mass function or pmf. The pmf defines the probability for each outcome of our random variable. For a Bernoulli random variable the pmf is

f(y)=y^p(1-p)^{1-y}

The two possible outcomes for a Bernoulli random variable are 0 and 1. If we plug these into the pmf, we find that the probability of Y=1 is p, written P(Y=1)=p, and the probability of Y=0 is 1-p, written P(Y=0)=1-p.

Bottom Line: A Bernoulli random variable can be used to model any event that has two exclusive outcomes.

Random variables

1 July, 2008 (08:12) | introductory | By: jarad

You probably first remember learning about variables in your elementary algebra class. You probably remember solving for x and y in equations such as

  • x+2=3

Maybe you then moved on to geometry where you were asked to find the degrees of an angle given partial information about other angles and line segments. For example, suppose you were asked to find x and y in the figure below.

  • 30-60-90 triangle

These variables can be determined mathematically.

A random variable is a variable whose value cannot be determined mathematically, but rather it needs to be observed. Prior to observation, the value of that variable is unknown and therefore random. Some examples of random variables are

  • the outcome of a basketball game (Bernoulli)
  • the number of heads in ten tosses of a coin (Binomial)
  • the number of cars crossing a particular intersection in a 10-minute period (Poisson)
  • the average height of a group of 50 people (Gaussian)

Note that before the event occurs, you do not know what the outcome will be. The names in parentheses are the typical probability distributions used to model their associated examples. Statisticians typically denote a random variable by an upper case Roman letter, e.g.

  • Bernoulli notation

This notation says that Y, the outcome of the basketball game, has a Bernoulli distribution with parameter p where p is the probability of the home team winning.

There are many more of these distributions: geometric, negative binomial, beta, chi-squared, gamma, log-normal, etc. Each random variable is associated with a probability distribution that determines the probability of realizing different values for the random variable. In subsequent posts, I will outline the different probability distributions and give examples of where each might be used.