3.1. General Framework for k-Sample Tests

Remark (Common Framework)

We have random variables

\(X_1^{(1)}, \ldots X_{n_1}^{(1)} \stackrel{\text{iid}}{\sim} F^{(1)}, \mbox{ }\mbox{ }\mbox{ }\mbox{ }Y_1^{(1)}, \ldots Y_{n_1}^{(1)} \stackrel{\text{iid}}{\sim} G^{(1)}\) (population 1)

\(X_1^{(2)}, \ldots X_{n_2}^{(2)} \stackrel{\text{iid}}{\sim} F^{(2)}, \mbox{ }\mbox{ }\mbox{ }\mbox{ }Y_1^{(2)}, \ldots Y_{n_2}^{(2)} \stackrel{\text{iid}}{\sim} G^{(2)}\) (population 2)

\(\mbox{ }\mbox{ }\mbox{ }\mbox{ }\mbox{ }\mbox{ }\mbox{ }\mbox{ }\mbox{ }\mbox{ }\mbox{ }\mbox{ }\mbox{ }\mbox{ }\mbox{ }\vdots\)

\(X_1^{(k)}, \ldots X_{n_k}^{(k)} \stackrel{\text{iid}}{\sim} F^{(k)}, \mbox{ }\mbox{ }\mbox{ }\mbox{ }Y_1^{(k)}, \ldots Y_{n_k}^{(k)} \stackrel{\text{iid}}{\sim} G^{(k)}\) (population \(k\))

여기서 우리는 \(H_0:F_1=F_2=\cdots=F_k\)를 test하고 싶다고 하자. 즉 모든 \(k\)개의 population이 다 같은 survival curve를 갖고 있는지를 알고 싶다. 하지만 이것보다 우리는 Hazard를 아래와 같이 비교하는게 더욱 편하다: \[ H_0: A_1=A_2=\cdots =A_k \mbox{ }\mbox{ }\mbox{ }\mbox{ which is equivalent to }\mbox{ }\mbox{ }\mbox{ } H_0:F_1=F_2=\cdots=F_k. \] 이 두 테스트는 사실 같은 test이다. 왜냐하면 Hazard와 Failure function은 초반에 살펴보았다시피 one-to-one relationship을 갖고 있기 때문이다.
Hazard를 비교하는 것이 더 쉬운 이유는 다음에서 다룰 것이지만 Martingale property를 이용할 수 있기 때문이다.

Remark

We assume that we have \(k\) functions \(A_1,\ldots,A_k\) and for each \(h\) \((h=1,\ldots,k)\) we have a counting process \(N_h\) satisfying the Aalen model, i.e., \[ N_h(t)-\int_0^t Y_h(s)dA_h(s)\mbox{ }\mbox{ }\mbox{ }\mbox{ is a martingale}, \] where \(Y_h\) is predictable with respect to some filtration.

Counting process는 “Filtration”과 항상 같이 등장한다는 것을 기억해야 한다.

We want to test the null hypothesis \[ H_0: A_1=\cdots =A_k. \] We assume that the vector \((N_1\ldots,N_k)\) forms a \(k\)-dimensional counting process. That is, we assume that the counting processes never jump at the same time, and that the counting processes and predictable processes are all defined with respect to the same filtration.

즉, \(k\)개의 counting processes에 대한 joint(higher) filtration 한 개가 존재한다는 것이다.
이 가정이 필요한 이유는 앞으로 우리가 이전에 배웠던 \(\mathcal{F}_t\)-orthogonality를 사용하기 위해서다.
사실 같은 시점에 jump가 여러번 이루어지지 않는다는 가정은 필요하지 않고, 모든 counting processes들이 independent하다는 가정만 있으면 충분하다.

For each \(h\), let \(\hat{A}_h\) be the Nelson-Aalen estimator of \(A_h\), and let \(\hat A\) be the Nelson-Aalen estimator based on the pooled sample, i.e., \[\begin{equation} \hat A(t)= \int_0^t \frac{I(Y_\cdot (s)>0)}{Y_\cdot (s)}dN_\cdot (s) \end{equation}\] where \[ N_\cdot (s)= \sum_{h=1}^k N_h(s)\mbox{ }\mbox{ }\mbox{ and }\mbox{ }\mbox{ } Y_\cdot (s)= \sum_{h=1}^k Y_h(s) \]

pooled sample은 결국 모든 샘플들을 한꺼번에 다 합쳐놓은 것이다.
여기서부터 ANOVA랑 굉장히 비슷해지는데, \(\hat A(t)\)는 ANOVA에서의 \(X_{\cdot \cdot}\)(grand mean)과 유사하다고 생각하면 편하다.

Assume \(H_0\) is true. Let \(A\) be the common value of the \(A_h\)’s. Let \[ M_h(t)=N_h(t)-\int_0^t Y_h(s)dA(s) \] and \[ M_\cdot(t)=N_\cdot(t)-\int_0^tY_\cdot(s)dA(s) \mbox{ }\mbox{ }\mbox{ }\mbox{ }\left( =\sum_{h=1}^k M_h(t) \right). \] Then, \(M_\cdot(t)\) is a martingale, since the sum of martingale is again a martingale. Hence, the common value \(A\) may be estimated by \(\hat A(t)= \int_0^t \frac{I(Y_\cdot (s)>0)}{Y_\cdot (s)}dN_\cdot (s)\) (처음에 배웠던, 양변에 derivative를 하여 구하는, Nelson Aalen공식 추론 방법으로 바로 구한다).

The general approach is to compare \(\hat A_h\) with \(\hat A\) through a weight process, then combine the \(k\) comparisons via quadratic form.

즉 \(H_0\)이 true라고 한다면, \(A_1=\cdots =A_k=A\)이기 때문에 \(\hat A_h(t)= \int_0^t \frac{I(Y_h(s)>0)}{Y_h(s)}dN_h(s)\)와 pooled sample을 통해 구한 \(\hat A(t)\)의 차의 (weighted sum)\(^2\)가 값이 커서는 안된다는 논리이다. 즉 predictable weight process \(W(t)\) 에 대해 다음과 같이 weight sum을 앞으로 고려할 것이다: \[ \int_0^t W(s)\left(d\hat A(s)-d\hat A_h(s)\right). \] 만약 \(\hat A\)와 \(\hat A_h\)가 모두 martingale이라면, 이 weight sum 또한 martingale이다 (by Theorem 5).
나중에 다루겠지만, weight가 어떤 것이냐에 따라서 test의 종류가 달라진다. Mantel-Haenszel인지 Gehan인지 등으로 나누어지는데 매우 중요하다.

back