Definition

A least squares estimate (LSE) of \(\beta\) is any vector \(\hat\beta\) that minimizes \[||Y-X\beta||\] over \(\beta \in \mathbb{R}^p\). So, \(\hat\beta\) is a least squares estimate if and only if \(X\hat\beta=\hat\mu(=PY)\).

  

Proposition
  1. If \(\hat\beta=(X'X)^-X'Y\), then \(\hat\beta\) is a least squares estimate (one of them).

  2. If \(\text{rank}(X)=p\), then the least squares estimate is unique (only one).

  3. If \(\text{rank}(X)< p\), then there are infinitely many \(\beta\)’s such that \(X\beta=\hat\mu\). The set of LSE’s forms an affine subspace of \(\mathbb{R}^p\).

  

Definition(identifiability)

Suppose \(P_\theta=N(X\beta, \sigma^2I), \theta=(\beta,\sigma^2)\) be a parametric family of distributions, and let \(\tau(\theta)=\lambda'\beta\) (a function of \(\theta\)). The parameter \(\tau(\theta)=\lambda'\beta\) is idenfitiable if \[ \lambda'\beta_1 \ne \lambda'\beta_2 \implies X\beta_1 \ne X\beta_2. \]   

Examples
  1. One-way ANOVA model : \(Y_{ij}=\alpha+\mu_i+\epsilon_{ij}\), where \(\beta=(\alpha, \mu_1,\mu_2,\mu_3)'\).

 

  1. Simple linear regression : Consider a linear regression with one predictor with \(X=\begin{bmatrix}1 & c\\\vdots &\vdots\\1 & c \end{bmatrix}\) where \(c\) is constant.
  1. Multiple linear regression : Consider a linear regression with two predictor with \(X=\begin{bmatrix}1 & d_1 & cd_1\\\vdots &\vdots&\vdots\\1 & d_n & cd_n \end{bmatrix}\) where \(c\) is constant. Assume that the \(d_i\)’s are not all equal.

  

Remark

If \(\text{rank}(X)<p\), then there exists a component of \(\beta\) that is not identifiable.

If \(\text{rank}(X)=p\), then all components of \(\beta\) are identifiable.

  

Definition

\(\lambda'\beta\) is estimable if there exists \(a_{n\times 1}\) such that \(E(a'Y)=\lambda'\beta\) for all \(\beta\),

i.e., \(\lambda'\beta\) is estimable if we can write \(\lambda'\beta=a'X\beta\) for some \(a\).

  

Definition

For \(\theta=\Lambda'\beta\), \(\hat\theta\) is a least squares estimate of \(\theta\) if \(\hat\theta=\Lambda'\hat\beta\), where \(\hat\beta\) is any least squares estimate of \(\beta\).

  

Proposition

Suppose \(\theta=\Lambda'\beta\), i.e., there exists a matrix \(A\) s.t. \(\Lambda'\beta=A'X\beta\) for all \(\beta\). Then, \(\Lambda'\beta\) has a unique least squares estimate, which is given by \(\hat\theta=A'PY\), where \(P\) is the projection onto the space spanned by the columns of \(X\).

  

Theorem

Suppose \(Y=X\beta+\epsilon\), \(\epsilon\sim (0,\sigma^2I)\). If there exists an \(A\) satisfying \(\Lambda'=A'X\), then

  1. \(E(\Lambda'\hat\beta)=\Lambda'\beta\);

  2. \(\text{Var}(\Lambda'\beta)=\sigma^2\Lambda'(X'X)^-\Lambda\), where \((X'X)^-\) is any generalized inverse of \((X'X)\).

 

In particular, if \(X\) is of full rank, so that \(X'X\) is invertible, then we can let \(A'=(X'X)^{-1}X'\). Then, \(A'X=I_{p\times p}\), and therefore

  1. \(\hat\beta=(X'X)^{-1}X'Y\);

  2. \(E(\hat\beta)=\beta\);

  3. \(\text{Var}(\hat\beta)=\sigma^2(X'X)^{-1}\).

  

Theorem

Let \(Y\sim N(X\beta, \sigma^2I)\). If \(r=\text{rank}(X)\), then, \(E(Y'(I-P)Y)=(n-r)\sigma^2\).

  

매우 중요

Theorem

Assume \(Y\) is normally distributed , \(Y\sim N(X\beta, \sigma^2I)\). Let \(r=\text{rank}(X)\). Suppose \(\Lambda'\beta\) is estimable, i.e., there exists \(A\) such that \(\Lambda'=A'X\). Then,

  1. \(\Lambda'\hat\beta=A'PY\), so \(\Lambda'\hat\beta\) is normally distributed: \[ \Lambda'\hat\beta \sim N(\Lambda'\beta, \sigma^2\Lambda'(X'X)^-\Lambda), \]

  2. \(\frac{Y'(I-P)Y}{\sigma^2}\sim \chi^2_{n-r}\),

  3. \(\Lambda'\hat\beta\) and \(Y'(I-P)Y\) are independent.

  

매우 중요

Remark

Assume \(Y\) is normally distributed , \(Y\sim N(X\beta, \sigma^2I)\). Let \(\sigma^2=\frac{Y'(I-P)Y}{n-r}\) and \(\Sigma=(X'X)^-\).

If \(X\) is of full rank, then \[ \hat\beta_j-\beta_j\sim N(0, \sigma^2\Sigma_{jj}),\\ \frac{(n-r)\hat\sigma^2}{\sigma^2}\sim \chi_{n-r}^2 \] and these are independent.

Thus, \[ \frac{(\hat\beta_j-\beta_j)/\sqrt{\sigma^2\Sigma_{jj}}}{\sqrt{\hat\sigma^2/\sigma^2}}\stackrel{\text{dist}}{=} \frac{N(0,1)}{\sqrt{\chi^2_{n-r}/(n-r)}}\sim t_{n-r}.\\ \implies \frac{(\hat\beta_j-\beta_j)}{\sqrt{\hat{\sigma}^2\Sigma_{jj}}}\sim t_{n-r}. \]

Also, for \(\lambda'\beta=a'X\beta\), \[ \frac{\lambda'\hat\beta-\lambda'\beta}{\sqrt{\hat{\sigma}^2\lambda'(X'X)^-\lambda}}\sim t_{n-r}. \]

  

Theorem

Assume Model NDA(no distributional assumption) and assume \(X\) is of full rank. If, \(\tilde\beta\) is any linear unbiased estimator of \(\beta\), then \(\hat\beta_{\text{LS}}\) is better than \(\tilde\beta\).

  

Theorem (Lehmann-Scheffe)

If \(Y\) is normally distributed \((Y\sim N(X\beta, \sigma^2I))\), then for any estimable parameter \(\lambda'\beta\), \(\lambda'\hat\beta_{\text{LS}}\) has minimum variance among all unbiased estimators of \(\lambda'\beta\) (linear or not).

Also, \(\hat\sigma^2\) has minimum variance among all unbiased estimators of \(\sigma^2\).

  

back