Data: \(Y_i=(Y_{i1},\ldots,Y_{iT})\), \(x_i= (x_{i1},\ldots,x_{iT})\) for \(i=1,\ldots,N\)(반복 측정 data).
For example, repeated measure in longitudinal study: \[ \mu_{it}=E(Y_{it}). \]
We have two types of models
Marginal Models: \[ g(\mu_{it})=x_{it}'\beta, \mbox{ }\mbox{ }t=1,\ldots,T. \] To fit by ML, we need to assume joint distribution for \((Y_{i1},\ldots,Y_{iT})\) to get a likelihood function.
Note that the likelihood equation is \[ \sum_i \left(\frac{d\mu_i}{d\beta}\right)\text{Cov}(Y_i)^{-1}(Y_i-\mu_i(\beta))=0. \]
Overdispersion에 대한 스토리이다.
Density를 이용하여 Quasi likelihood도 쓸수 있다(likelihood equation의 summation이 j에 대해서 늘어난 경우 Covariance modeling을 하면 된다).
하지만 GEE가 제일 많이 쓰인다.
Let \(Y_i=(Y_{i1},\ldots,Y_{iT})\) be the outcomes for subject \(i\), \(i=1,\ldots,N\).
Let \(x_{it}=p\times 1\) be vectors of explanatory variables for \(y_{it}\); \[ X_i=\begin{pmatrix} x_{i1} \\\vdots \\ x_{iT}\end{pmatrix}. \] Let \(\eta_{it}\) be linear predictors such that \(\eta_{it}=g(\mu_{it})=x_{it}'\beta\), \(t=1,\ldots,T\) for a link function \(g\).
Assume \(y_{it}\) has density or mass function \[ f(y_{it}; \theta_{it},\phi)= \exp\left[\frac{y_{it}\theta_{it}-b(\theta_{it})}{\phi}+c(y_{it};\phi)\right]. \] Recall \(\mu_{it}=b'(\theta_{it})\) and \(\text{Var}(Y_{it})=\phi b''(\theta_{it})\).
Assume working correlation matrix \(R(\alpha)\) depends on parameter \(\alpha\): \[ R(\alpha)=\begin{cases}\text{corr}(y_{it},y_{it'})=\alpha \mbox{ }\mbox{ for all }t,t',& \mbox{exchangeable};\\\text{corr}(y_{it},y_{it'})=\alpha^{|t-t'|} & \mbox{autoregressive;} \\R(\alpha)=I,&\mbox{independent;} \\ \text{corr}(y_{it},y_{it'})=\alpha_{tt'}(=\alpha_{t't}), & \mbox{unstructured.}\end{cases} \] Let \(b(\theta_i)=(b(\theta_{i1}),\ldots,b(\theta_{iT}))\). So, \[\begin{eqnarray*} \mu_i&=&b'(\theta_i),\\ B_i&:=&\text{diag}\{b''(\theta_i)\}\\ &=& \text{diag}\{b''(\theta_{i1}),\ldots,b''(\theta_{iT}) \}. \end{eqnarray*}\]
Woring covariance matrix for \(y_i\) is \[
V_i=\phi \mbox{ }B_i^{1/2}R(\alpha)B_i^{1/2}=\text{Cov}(Y_i)
\] if \(R(\alpha)=\)true correlation matrix.
Let \(\Delta_i=\text{diag}\left(\frac{d\theta_{i1}}{d\eta_{i1}}\ldots,\frac{d\theta_{iT}}{d\eta_{iT}}\right).\)
Since \(\frac{d\mu_{it}}{d\beta_j}= \frac{d\mu_{it}}{d\theta_{it}}\frac{d\theta_{it}}{d\eta_j}\frac{d\eta_{it}}{d\beta_j}\), \(D_i:=\frac{d\mu_i}{d\beta}=\frac{db'(\theta_i)}{d\beta}=B_i\Delta_iX_i\) \(\left(\frac{d\mu_{it}}{d\theta_{it}}=\frac{db'(\theta_{it})}{d\theta_{it}}=b''(\theta_{it})\right)\).
Recall for univariate GLMs and QL, some equations have form \[ \sum_i \left(\frac{d\mu_i}{d\beta}\right)V(\mu_i)^{-1}(Y_i-\mu_i(\beta))=0. \]
The multivariate analog is the set of Generalized Estimating Equations \[ \sum_i D_i'V_i^{-1}(Y_i-\mu_i(\beta))=0 \mbox{ }\mbox{ (}p\mbox{ equations)}. \]
Then, Fisher scoring algorithm can be iterated between for solving the GEE’s, and moment estimation of \(\phi\) and \(\alpha\) can be used from residuals.
Under weak regularity conditions, \[ \sqrt{n}(\hat\beta-\beta)\stackrel{d}\rightarrow N(0,V_G) \] with \[ V_G= \left[ \sum_i D_i'V_i^{-1}D_i \right]^{-1}\times \left[ \sum_i D_i'V_i^{-1} \text{Cov}(Y_i)V_i^{-1}D_i \right]\left[ \sum_i D_i'V_i^{-1}D_i \right]^{-1}. \] If working correlation assumptions are correct, \[ V_G=\phi \left[ \sum_i D_i'V_i^{-1}D_i \right]^{-1}. \]
Assume \(R(\alpha)=I\)(independent) and canonical link\((\Delta_i=1\) for all \(i)\). Then, \[\begin{eqnarray*} \sum_i D_i'V_i^{-1}(y_i-\mu_i(\beta))&=& \sum_i X_k'\Delta_iB_iV_i^{-1}\{Y_i-\mu_i(\beta)\}\\ &=& \sum_i \frac{1}{\phi}X_k'\Delta_i\{Y_i-\mu_i(\beta)\}\\ &=& \sum_i \frac{1}{\phi}X_k'\{Y_i-\mu_i(\beta)\}=0\\ \implies&& \sum_i X_i'y_i=\sum_i X_i'\mu_i. \end{eqnarray*}\] So, \(\hat\beta\)=ordinary ML estimator for GLM treating \((y_{i1},\ldots,y_{it})\) as independent.
Multivariate responses(repeated measures)에 대해서 Marginai modeling을 할 때 똑같이 Likelihood equation을 사용한다. 단지 Dimension이 늘어났을 뿐이다.
Overdispersion을 해결하기 위해서 똑같이 \(\text{Cov}(Y_i)\)에 \(\phi\)를 함께 고려해주고, \(\hat\phi\)는 Pearson chisquare를 통해 moment estimator를 대입한다.