Model of interest : \(\{\hat\mu_i\}, \{\hat\theta_i\}\).
‘Saturated’ model has maximum likelihood estimate \(\tilde{\mu_i}=y_i\), \(i=1,2,\ldots,N\).
Recall log likelihood is \[ l(\theta;y)=\sum_{i=1}^N\{y_i \theta_i-b(\theta_i) \}/a(\phi)+\sum_{i=1}^N c(y_i;\phi). \] Let \(l(\hat{\theta};y)=l(\hat{\mu};y)\) be log-likelihood maximized over \(\beta\),
and \(l(\tilde{\theta};y)=l(y;y)\) be log-likelihood for saturated model.
Likelihood Ratio(LR) statistics for testing “\(H_0 : \mbox{model holds}\)” is \[\begin{eqnarray*} \Lambda &=& -2\log \frac{\text{M.L under } H_0}{\text{M.L under saturated model}}\\ &=& 2[l(y;y)-l(\hat\mu;y)] \\ &=& 2\sum_{i=1}^N \{ y_i\tilde{\theta_i}-b(\tilde\theta_i)\}/a(\phi)-\{ y_i\hat{\theta_i}-b(\hat\theta_i)\}/a(\phi).\\ \end{eqnarray*}\]
Suppose \(a(\phi)=\phi/w_i\). Then,
\[\begin{eqnarray*} \Lambda &=& 2\sum_{i=1}^N \{ y_i\tilde{\theta_i}-b(\tilde\theta_i)\}/a(\phi)-\{ y_i\hat{\theta_i}-b(\hat\theta_i)\}/a(\phi)\\ &=& 2\sum_{i=1}^N w_i \{ y_i(\tilde{\theta_i}-\hat{\theta_i})-b(\tilde\theta_i)+b(\hat\theta_i)\}/\phi\\ &=& D(y;\mu_i)/\phi, \end{eqnarray*}\]
if we define \(D(y;\mu_i)=2\sum_{i=1}^N w_i \{ y_i(\tilde{\theta_i}-\hat{\theta_i})-b(\tilde\theta_i)+b(\hat\theta_i)\}\).
Here, \(D(y;\mu_i)\) is called Deviance and \(\Lambda=D(y;\mu_i)/\phi\) is called Scaled Deviance.
Note that \(D(y;\mu_i)\ge 0\) and the greater the deviance is, the poorer the fit is.
Deviance considers comparing model \(M_0\) nested within \(M_1\).
Since \(l(\hat\mu_1;y)\ge l(\hat\mu_0;y)\), LR statistic comparing two models is \[\begin{eqnarray*} \Lambda&=& -2 [ l(\hat\mu_0;y)- l(\hat\mu_1;y)]\\ &=& -2 [l(\hat\mu_0;y) - l(y;y) + l(y;y) - l(\hat\mu_1;y) ]\\ &=& D(y;\hat\mu_0)/\phi-D(y;\hat\mu_1)/\phi (\ge 0)\\ &=& 2\sum_{i=1}^N w_i\{y_i(\tilde\theta_i-\hat\theta_{i0})-b(\tilde\theta_i)+b(\hat\theta_{i0})-y_i(\tilde\theta_i-\hat\theta_{i1})+b(\tilde\theta_i)-b(\hat\theta_{i1})\}/\phi\\ &=& 2\sum_{i=1}^N w_i\{y_i(\hat\theta_{i1}-\hat\theta_{i0})+b(\hat\theta_{i0})-b(\hat\theta_{i1})\}/\phi\\ &\xrightarrow{d} & \chi^2_{p_1-p_0}, \end{eqnarray*}\] where \(p_1\) and \(p_0\) are the number of parameters for model \(M_1\) and \(M_0\), respectively.
\(\hat{\theta_i}=\log{\hat\mu_i}\), \(b(\hat\theta_i)=e^{\hat\theta_i}=\hat\mu_i\).
\(\tilde{\theta_i}=\log y_i\), \(b(\tilde\theta_i)=e^{\tilde\theta_i}=y_i\).
Thus, \[\begin{eqnarray*} D(y;\hat\mu)&=&D(y;\hat\mu)/\phi \\ &=& 2\sum_{i=1}^N y_i(\log y_i-\log \hat\mu_i)-y_i + \hat\mu_i\\ &=& 2\sum_{i=1}^N y_i\log \frac{y_i}{\hat\mu_i}-y_i + \hat\mu_i. \end{eqnarray*}\]
\(\hat{\theta_i}=\hat\mu_i\), \(b(\hat\theta_i)=\hat\mu_i^2/2\).
\(\tilde{\theta_i}=y_i\), \(b(\tilde\theta_i)=y_i^2/2\).
Thus, \[\begin{eqnarray*} D(y;\hat\mu)&=& 2\sum_{i=1}^N y_i(y_i-\hat\mu_i) -y_i^2/2+ \hat\mu_i^2/2\\ &=& 2\sum_{i=1}^N (y_i^2-y_i\hat\mu_i)-y_i^2/2+ \hat\mu_i^2/2\\ &=& \sum_{i=1}^N y_i^2-2y_i\hat\mu_i+\hat\mu_i^2\\ &=& \sum_{i=1}^N (y_i-\hat\mu_i)^2. \mbox{ (Residual Sum of Squares)} \end{eqnarray*}\]
Note that the scaled deviance is \[ \frac{D(y;\hat\mu)}{\phi}=\sum_{i=1}^N \frac{(y_i-\hat\mu_i)^2}{\sigma^2}. \]
For some cases, scaled deviance\(\sim \chi^2_{N-p}\) where \(p\) is the number of parameter \(\beta\)’s. (e.g. Normal)
For some others, scaled deviance\(\xrightarrow{d}\chi^2_{N-p}\). (e.g. poisson with large \(\mu_i\))
For others, need not be asymptotic \(\chi^2\). (e.g. binary with continuous explanatory variables).