Components of GLM

  1. Random component: it specifies \(Y\) and its probability distributions.

  2. Systematic component : it specifies explanatory variables.

  3. Link function : it specifies function of \(E(Y)\) to be a model.

  

Random Component

Let \(Y=(Y_1,\ldots,Y_n)\) where \(Y_i\) is a response for the subject \(i\). Then,

  1. \(Y_i\)’s are independent.

  2. Probability distribution function for \(Y_i\) has the form of exponential dispersion family, \[\begin{equation*} f(y_i|\theta_i, \phi)= \exp[\{y_i\theta_i-b(\theta_i)\}/a(\phi)+c(y_i;\phi)], \end{equation*}\] where \(\theta_i\) and \(\phi\) are a natural parameter and a dispersion parameter, respectively.

  

Examples
  1. Poisson data : \(Y_i \sim \text{poisson}(\mu_i)\). Then, \[\begin{eqnarray*} f(y_i;\mu_i)&=& \frac{e^{-\mu_i}\mu_i^{y_i}}{y_i!}\\ &=& \exp\{y_i \log \mu_i -\mu_i -\log y_i!\}. \end{eqnarray*}\]

    Thus, \(\theta_i=\log \mu_i\), \(b(\theta_i)=\mu_i=e^{\theta_i}\), \(a(\phi)=1\), \(c(y_i;\phi)= -\log y_i!\).

    (여기서 \(\theta_i\)를 natural parameter라고 부른다. 이는 exponential family에서의 parameter로서 우리가 배우는 GLM의 목적 중 하나는 데이터의 parameter를 exponential family의 natural parameter와 연결시키는 것이다.)

 

  1. Binary data : \(Y_i \sim \text{Bin}(n_i, \pi_i)\). That is

    \(Y_i\): Proportion of successes out of \(n_i\) trials with success probability \(\pi_i\)

    Then, for \(Y_i=0, \frac{1}{n_i}, \ldots, 1,\) \[\begin{eqnarray*} f(y_i;\pi_i, n_i)&=& {n_i \choose n_i\pi_i} \pi^{n_iy_i}(1-\pi_i)^{n_i(1-y_i)}\\ &=& \exp \left\{ n_iy_i\log \pi_i + n_i\log(1-\pi_i)-n_iy_i\log(1-\pi_i)+\log{n_i \choose n_iy_i}\right\}\\ &=& \exp \left\{ \frac{y_i\log \frac{\pi_i}{1-\pi_i} + \log(1-\pi_i)}{1/n_i}+\log{n_i \choose n_iy_i}\right\} \end{eqnarray*}\]

    Thus, \(\theta_i=\log \frac{\pi_i}{1-\pi_i}=\text{logit}{\pi_i}\implies \pi_i=\frac{e^{\theta_i}}{1+e^{\theta_i}},\)
    \(b(\theta_i)= -\log(1-\pi_i)=\log(1+e^{\theta_i}),\)

    \(a(\phi)=1/n_i\), \(c(y_i;\phi)=\log{n_i \choose n_iy_i}\).

 

  1. Normal data : \(Y_i \sim N(\mu_i, \sigma^2)\).

    Then, \[\begin{eqnarray*} f(y_i;\mu_i, \sigma^2)&=& \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{1}{2\sigma^2}(y_i-\mu_i)^2 \right)\\ &=& \exp\left(-\log \sqrt{2\pi}-\frac{1}{2}\log\sigma^2-\frac{1}{2\sigma^2}(y_i^2-2y_i\mu_i+\mu_i^2)\right)\\ &=&\exp\left(\frac{y_i\mu_i-\frac{1}{2}\mu_i^2}{\sigma^2}-\frac{y_i^2}{2\sigma^2}-\frac{1}{2}\log \sigma^2 -\log\sqrt{2\pi} \right) \end{eqnarray*}\]

    Thus, we have \(\theta_i=\mu_i\), \(b(\theta_i)= \frac{\theta_i^2}{2}\), \(a(\phi)=\sigma^2\).

  

Moments of \(Y\)

Note that

\(l(\theta_i; \phi, y_i)= \log f(y_i;\theta_i, \phi) = \frac{ y_i \theta_i-b(\theta_i)}{a(\phi)}+c(y_i;\phi)\).

Then,

\[ \frac{d l}{d \theta_i}= \frac{y_i-b'(\theta_i)}{a(\phi)}. \]

Since \(E\left[\frac{d l}{d \theta_i}\right]=0\) from Bartlett’s Identity, we have \[ E\left[\frac{d l}{d \theta_i}\right]= E\left[\frac{Y_i-b'(\theta_i)}{a(\phi)}\right]=0\\ \iff b'(\theta_i)= E[Y_i]. \]

Next, we have \[ \frac{d^2 l}{d \theta_i^2}= \frac{-b''(\theta_i)}{a(\phi)}. \]

Since \(E\left[\frac{d^2 l}{d \theta_i^2}\right]= -E\left[ \left(\frac{d l}{d \theta_i}\right)^2 \right]\), we have

\[ \frac{-b''(\theta_i)}{a(\phi)}=-E\left[ \left(\frac{Y_i-b'(\theta_i)}{a(\phi)} \right)^2 \right]\\ \iff \frac{b''(\theta_i)}{a(\phi)}= \frac{\text{Var}(Y_i)}{a(\phi)^2}\\ \iff \text{Var}(Y_i)= a(\phi)b''(\theta_i). \]

  

Examples
  1. \(Y_i\sim \text{poisson}(\mu_i)\) : \(\mu_i=e^{\theta_i}\), \(b(\theta_i)= e^{\theta_i}\), and \(a(\phi)=1\).

    Then, we have

    \(E(Y_i)= b'(\theta_i)=e^{\theta_i}=\mu_i\),

    \(\text{Var}(Y_i)=a(\phi_i)b''(\theta_i)= b''(\theta_i)=e^{\theta_i}=\mu_i\).

 

  1. \(n_iY_i\sim \text{Bin}(n_i, \pi_i)\), where \(\theta_i=\log\frac{\pi_i}{1-\pi_i}\), \(b(\theta_i)=\log(1+e^{\theta_i})\), \(a(\phi)=1/n_i\).

    Then, we have

    \(E(Y_i)=b'(\theta_i)= \frac{e^{\theta_i}}{1+e^{\theta_i}}\),

    \(\text{Var}(Y_i)=a(\phi)b''(\theta_i)= \frac{\pi_i(1-\pi_i)}{n_i}.\)

  

Systematic Component

Let \((x_{i0},x_{i1},\ldots,x_{ip})\) be value of \(p\) explanatory variables for subject \(i\). Then, \[ \eta_i=\sum_{j=0}^p x_{ij} \beta_j \]

is called systematic component with unknown parameter \((\beta_0,\beta_1,\ldots,\beta_p)\).

  

Examples
  1. \(Y_i \sim \text{poisson}(\mu_i)\) : \(\mu_i=b'(\theta_i)=e^{\theta_i}=g^{-1}(\theta_i)\).

    • Canonical link : \(\theta_i=g(\mu_i)=\log \mu_i\).

     

  2. \(n_iY_i\sim \text{Bin}(n_i,\pi_i)\) : \(\pi_i=b'(\theta_i)=\frac{e^{\theta_i}}{1+e^{\theta_i}}=g^{-1}(\theta_i)\).

    • Canonical link : \(\theta_i=\log \frac{\pi_i}{1-\pi_i}\).

 

  1. \(Y_i\sim N(\mu_i,\sigma^2)\) : \(\mu_i=b'(\theta_i)= \theta_i= g^{-1}(\theta_i)\)

    • Canonical link : \(\mu_i= \theta_i\).

  

back