57 Statistical inference
Definition 57.1 (Statistical model) A statistical model is a formalized structure used for quantifying uncertainty, which consists of
- a set of random variables of interest: \(X_1,...,X_n\)
- a specification of the joint distribution of the random variables \(f(X_1,...,X_n; \theta_1,...,\theta_k)\)
- the parameters \(\theta_1,...,\theta_k\) that determine the distribution
- (possibly) the distributions of the parameters \(h(\theta_1,...,\theta_k)\).
Definition 57.2 (Statistical inference) A statistical inference is a procedure that produces a probabilistic statement about some or all parts of a statistical model (e.g. a parameter, a conditional distribution, etc).
Definition 57.3 (Parameter space) The characteristics that determine the joint distribution of the random variables of interest is called the parameters of the distribution. The set of all possible values of the parameters is called the parameter space.
There is a debtate over wheter unknown parameters should be treated as random variables or merely as numbers. Treating parameters as numbers is typically adopted by the Frequentist framework. We use the observables to provide the best estimate of the unknown parameter. For example, \[\bar{X}_n \to_p \mu\] Whereas treating parameters as random variables is typically associated with the Bayesian framework. We update the distribution of the parameters with the information provided by the observables: \[p(\theta | \bar{X}_n) \propto p(\bar{X}_n | \theta) p(\theta).\]
Definition 57.4 (Statistic) Suppose we observe random variables \(X_1,...,X_n\). Let \(g\) be a real-valued function of \(n\) variables. Then the random variable \(T=g(X_1,...,X_n)\) is called a statistic.
Definition 57.5 (Sampling distribution) Statistic \(T\) is a function of random variables, therefore it is itself a random variable. The distribution of \(T\) is called the sampling distribution of \(T\).
The name comes from the fact that \(T\) depends on the sampling process (a different sample gives to a different value of the statistic).
Definition 57.6 (Estimator) Let \(\theta\) be a parameter of interest in a statistical model. An estimator of parameter \(\theta\) is a statistic \(\hat\theta = g(X_1,...,X_n)\) constructed to learn about \(\theta\). If \(X_1=x_1, ..., X_n=x_n\) are observed, then \(g(x_1,...,x_n)\) is called the estimate of \(\theta\).
Constructing an estimator
How to construct an estimator is an art in itself, which needs to be guided by principles. We introduction the methods to construct estimators in the following sections.
Definition 57.7 (Method of moments) Let \(X_1,...,X_n\) be a random sample from a distribution with at least \(k\) finite moments. Let \(m_j = E(X^j; \theta)\) be the \(j\)-th order moment, \(j=1,...,n\). Suppose the parameter of interest \(\theta\) can be expressed as a function of the moments: \(\theta = M(m_1, ..., m_k)\). The method-of-moments estimator of \(\theta\) is given by \[\hat\theta = M(\hat m_1, ..., \hat m_k)\] where \(\hat m_j\) is the sample moment: \(\hat m_j = \frac{1}{n}\sum_{i=1}^{n} X_i^j\) for \(j=1,...,k\).
The method of momment is guided by the LLN, as the sample moments converges to the true moments for large samples. As an example, the estimator for population mean \(\mu\) is: \[\hat\mu = \frac{1}{n}\sum_{i=1}^{n} X_i.\] The estimator for population variance \(\sigma^2\) is: \[\hat\sigma^2 = \frac{1}{n}\sum_{i=1}^{n} X_i^2.\] The LLN ensures \(\hat\sigma^2\to \sigma^2\) when the sample size is large. But we know \(\hat\sigma^2\) is biased for small samples.