Let’s understand few basic terminologies related to variables in Probability and Statistics.
- Discrete and Continuous Random Variables:
- Random Variable (RV): A random variable is a variable that can take on different values, and its value depends on the outcome of a random experiment. Random variables can be classified into two main types:
- Discrete Random Variable: A discrete random variable is one that can take on a countable number of distinct values. These values are typically integers or whole numbers. Examples include the number of heads in multiple coin flips or the count of customers entering a store in a given hour.
- Continuous Random Variable: A continuous random variable is one that can take on an uncountable infinity of values, often within a specific range. These values are typically real numbers. Examples include the height of individuals in a population or the time it takes for a computer to complete a task.
- Probability Mass Functions (PMFs) and Probability Density Functions (PDFs):
- Probability Mass Function (PMF): A PMF is a function that describes the probability distribution of a discrete random variable. It assigns probabilities to each possible outcome of the random variable. Mathematically, for a discrete random variable X, the PMF is denoted as P(X = x), where x represents a specific value the random variable can take.
- Probability Density Function (PDF): A PDF is a function that describes the probability distribution of a continuous random variable. Unlike the PMF, which assigns probabilities to specific values, the PDF assigns probabilities to intervals. Mathematically, for a continuous random variable X, the PDF is denoted as f(x), and it represents the relative likelihood of X falling within a specific interval.
- Cumulative Distribution Functions (CDFs):
- Cumulative Distribution Function (CDF): The CDF of a random variable X is a function that gives the probability that X takes on a value less than or equal to a specified value x. For a discrete random variable, the CDF is the sum of the PMF up to that value. For a continuous random variable, the CDF is the integral of the PDF up to that value. The CDF is denoted as F(x) for X.
- Expected Value and Variance:
- Expected Value (Mean): The expected value of a random variable is a measure of its central tendency. For a discrete random variable X, it is calculated as the sum of each possible value of X, weighted by its probability: E(X) = Σ [x * P(X = x)]. For a continuous random variable, it is calculated as the integral of x * f(x) over the entire range of possible values.
- Variance: The variance measures the spread or dispersion of a random variable’s values around its expected value. It quantifies how much the values deviate from the mean. For a discrete random variable X, it is calculated as Var(X) = Σ [(x – E(X))^2 * P(X = x)]. For a continuous random variable, it is calculated similarly but using integrals.
- Common Distributions (e.g., Binomial, Poisson, Normal):
- Binomial Distribution: The binomial distribution models the number of successes (usually denoted as X) in a fixed number of independent Bernoulli trials (experiments with two possible outcomes: success or failure). It is characterized by two parameters: n (the number of trials) and p (the probability of success in each trial).
- Poisson Distribution: The Poisson distribution models the number of events (e.g., arrivals, occurrences) happening in a fixed interval of time or space. It is characterized by a single parameter, λ (the average rate of events).
- Normal Distribution (Gaussian Distribution): The normal distribution is a continuous probability distribution that is symmetric and bell-shaped. It is characterized by two parameters: mean (μ) and standard deviation (σ). It is widely used due to the Central Limit Theorem, which states that the sum or average of a large number of independent random variables follows a normal distribution.