Discrete Distribution in R
Last Updated :
26 Sep, 2024
In statistics, distributions can be broadly classified into continuous and discrete categories. A discrete distribution is one where the random variable takes on countable values. These values are often whole numbers, such as 0, 1, 2, 3, etc. Examples of discrete distributions include the number of students in a class, the result of rolling a die, or the count of customers visiting a store on a given day.
This article will cover the basics of discrete distributions, and different types, and how to work with them in R programming using various functions and libraries. We will also provide practical examples to illustrate how to perform calculations and create visualizations.
What is a Discrete Distribution?
A discrete distribution describes the probability of occurrence of distinct values of a random variable. Unlike continuous distributions, where values can be infinitely divided, discrete distributions have gaps between the values.
- The variable can take on a countable set of values.
- Each value has a non-zero probability of occurrence.
- The sum of all probabilities equals 1.
Types of Discrete Distributions
- Binomial Distribution: Describes the number of successes in a fixed number of independent Bernoulli trials.
- Poisson Distribution: Represents the number of events occurring in a fixed interval of time or space.
- Geometric Distribution: Indicates the number of trials needed to achieve the first success.
- Hypergeometric Distribution: Describes the probability of k successes in n draws from a finite population.
Now we will discuss all the Discrete Distributions in detail using R Programming Language.
1. Binomial Distribution
The binomial distribution represents the number of successes in a fixed number of independent trials, with each trial having the same probability of success.
R
# Parameters
n <- 10 # Number of trials
p <- 0.5 # Probability of success
# Probability of getting exactly 4 successes
prob_4 <- dbinom(4, size = n, prob = p)
print(paste("Probability of getting exactly 4 successes:", prob_4))
# Cumulative probability of getting at most 4 successes
cum_prob_4 <- pbinom(4, size = n, prob = p)
print(paste("Cumulative probability of at most 4 successes:", cum_prob_4))
# Generate 100 random values from a binomial distribution
binom_samples <- rbinom(100, size = n, prob = p)
hist(binom_samples, main = "Histogram of Binomial Distribution", col = "lightblue")
Output:
[1] "Probability of getting exactly 4 successes: 0.205078125"
[1] "Cumulative probability of at most 4 successes: 0.376953125"
Binomial Distribution in R2. Poisson Distribution
The Poisson distribution represents the number of events occurring in a fixed interval, given a constant average rate.
R
# Parameter
lambda <- 3 # Average rate
# Probability of exactly 2 events
prob_2 <- dpois(2, lambda)
print(paste("Probability of exactly 2 events:", prob_2))
# Cumulative probability of at most 2 events
cum_prob_2 <- ppois(2, lambda)
print(paste("Cumulative probability of at most 2 events:", cum_prob_2))
# Generate 100 random values from a Poisson distribution
poisson_samples <- rpois(100, lambda)
hist(poisson_samples, main = "Histogram of Poisson Distribution", col = "lightgreen")
Output:
[1] "Probability of exactly 2 events: 0.224041807655388"
[1] "Cumulative probability of at most 2 events: 0.423190081126843"
Poisson Distribution in R3. Geometric Distribution
The geometric distribution represents the number of trials needed to achieve the first success.
R
# Parameter
p <- 0.3 # Probability of success
# Probability of achieving the first success on the 5th trial
prob_5 <- dgeom(4, prob = p) # Note: x = 4 corresponds to the 5th trial
print(paste("Probability of first success on the 5th trial:", prob_5))
# Generate 100 random values from a geometric distribution
geom_samples <- rgeom(100, prob = p)
hist(geom_samples, main = "Histogram of Geometric Distribution", col = "lightcoral")
Output:
[1] "Probability of first success on the 5th trial: 0.07203"
Geometric Distribution in R4. Hypergeometric Distribution
The hypergeometric distribution describes the probability of getting a specific number of successes in a series of draws from a finite population without replacement.
R
# Parameters
m <- 15 # Number of success states
n <- 10 # Number of failure states
k <- 5 # Number of draws
# Probability of getting exactly 2 successes in 5 draws
prob_2 <- dhyper(2, m, n, k)
print(paste("Probability of getting exactly 2 successes:", prob_2))
# Generate 100 random values from a hypergeometric distribution
hypergeom_samples <- rhyper(100, m, n, k)
hist(hypergeom_samples, main = "Histogram of Hypergeometric Distribution", col = "lightyellow")
Output:
[1] "Probability of getting exactly 2 successes: 0.237154150197629"
Hypergeometric DistributionConclusion
In this article, we explored the concept of discrete distributions and how to work with various types in R. We covered the binomial, Poisson, geometric, and hypergeometric distributions, demonstrated their functions, and illustrated how to visualize them using both base R and ggplot2
. Understanding discrete distributions and their practical applications is crucial for data analysis, as they are commonly used in a wide range of fields like finance, biology, and engineering. With the knowledge and examples provided, you are now equipped to handle and visualize discrete distributions effectively in R.