Abstract
BACKGROUND AND PURPOSE: Early ischemic changes on pretreatment NCCT quantified using ASPECTS have been demonstrated to predict outcomes after IAT. We sought to determine the interobserver reliability of ASPECTS for patients with AIS with PAO and to determine whether pretreatment ASPECTS dichotomized at 7 would demonstrate at least substantial κ agreement.
MATERIALS AND METHODS: From our prospective IAT data base, we identified consecutive patients with anterior circulation PAO who underwent IAT over a 6-year period. Only those with an evaluable pretreatment NCCT were included. ASPECTS was graded independently by 2 experienced readers. Interrater agreement was assessed for total ASPECTS, dichotomized ASPECTS (≤7 versus >7), and each ASPECTS region. Statistical analysis included determination of Cohen κ coefficients and concordance correlation coefficients. PABAK coefficients were also calculated.
RESULTS: One hundred fifty-five patients met our study criteria. Median pretreatment ASPECTS was 8 (interquartile range 7–9). Interrater agreement for total ASPECTS was substantial (concordance correlation coefficient = 0.77). The mean ASPECTS difference between readers was 0.2 (95% confidence interval, −2.8 to 2.4). For dichotomized ASPECTS, there was a 76.8% (119/155) observed rate of agreement, with a moderate κ = 0.53 (PABAK = 0.54). By region, agreement was worst in the internal capsule and the cortical areas, ranging from fair to moderate. After adjusting for prevalence and bias, agreement improved to substantial or near perfect in most regions.
CONCLUSIONS: Interobserver reliability is substantial for total ASPECTS but is only moderate for ASPECTS dichotomized at 7. This may limit the utility of dichotomized ASPECTS for IAT selection.
ABBREVIATIONS:
- AIS
- acute ischemic stroke
- ASPECTS
- Alberta Stroke Program Early CT Score
- CI
- confidence interval
- IAT
- intra-arterial stroke therapy
- IQR
- interquartile range
- M1
- MCA stem
- M2
- MCA second-order branch
- PABAK
- prevalence-adjusted, bias-adjusted kappa
- PAO
- proximal artery occlusion
NCCT is the most widely used imaging technique in the evaluation of acute stroke because of its availability, speed, and accuracy for ruling out intracranial hemorrhage.1,2 In patients with AIS, early reperfusion therapy improves outcomes in patients with small infarcts3,4 but is harmful in those with extensive injury (eg, greater than 33% of the MCA territory).5 Unfortunately, during the first several hours of acute ischemic stroke, signs of tissue injury on NCCT are subtle, and interobserver agreement is limited.6 ASPECTS is a semiquantitative grading system developed to improve the assessment of early ischemic changes in the MCA territory on NCCT.7
In recent studies, baseline (pretreatment) dichotomized ASPECTS has been demonstrated to predict good outcomes after IAT.8⇓–10 Yet before this approach is used clinically to select patients for IAT, it should demonstrate good interobserver agreement. The literature is conflicting in this regard, with κ values ranging from fair to substantial.7,11 However, these studies included a broad range of stroke severities and may not necessarily apply to the population eligible for IAT, namely, patients with proximal cerebral artery occlusions.
We sought to evaluate the reliability of baseline ASPECTS, specifically in patients with anterior circulation PAO undergoing IAT, and how this would affect treatment decision-making for IAT. Specifically, we aimed to determine whether pretreatment ASPECTS dichotomized at 7 would demonstrate at least substantial κ agreement. We also examined the interobserver reliability of ASPECTS by topographic region.
Materials and Methods
From our prospective observational IAT data base, we identified 169 consecutive patients with AIS with anterior circulation PAO (terminal ICA and/or MCA M1 or M2 segment occlusion) who underwent intraarterial treatment between June 2004 and May 2010. General inclusion criteria for IAT included the following: 1) imaging assessment completed <8 hours post ictus; 2) NCCT without intracranial hemorrhage; 3) NCCT hypodense ischemic lesion or MRI DWI hyperintense ischemic lesion <1/3 MCA territory; 4) proximal artery occlusion on CT angiography; 5) NIHSS score >8 or with significant aphasia; 6) if IV rtPA was administered, no treatment response. Additional inclusion criteria for this study included an evaluable pretreatment noncontrast head CT scan. Of the 169 patients, 14 were excluded due to the presence of chronic infarcts ipsilateral to the acute stroke or lack of pretreatment NCCT imaging at our hospital. Clinical and imaging data were retrospectively analyzed. This study was approved by our local institutional review board and was Health Insurance Portability and Accountability Act compliant.
Imaging Protocol and Analysis
NCCT scans of the head were performed in helical mode (1.25-mm thickness, kV 120, mA 250) and reconstructed in the axial plane with 5-mm section thickness. This imaging was evaluated independently by 2 experienced neuroradiologists using the ASPECTS system. Contrary to the original scoring system, which utilized only 2 brain sections,7 readers graded early ischemic change in each of the 10 ASPECTS regions (Fig 1) according to current methodology, which utilizes all of the scan images.12 Early ischemic change was defined as tissue hypoattenuation or loss of gray–white matter differentiation, as these changes have been associated with edema and irreversible injury. Isolated cortical swelling, which was part of the original ASPECTS criteria, was not included, as recent work has demonstrated that it is associated with increased cerebral blood volume and may represent penumbral (threatened but salvageable) tissue.13 Discrepancies between readers were resolved by consensus adjudication. Window and level settings were adjusted at the discretion of the readers to increase contrast between normal and ischemic brain. ASPECTS was calculated by subtracting the number of affected regions from a total possible score of 10. Imaging review was performed blinded to all clinical information except stroke side.
Statistical Analysis
Interrater agreement was measured for total ASPECTS, dichotomized ASPECTS (≤7 versus >7), and each individual topographic region. ASPECTS was dichotomized at 7, given recent evidence supporting this threshold for identifying good clinical response to intra-arterial treatment or reperfusion.8⇓–10 For total ASPECTS, agreement was measured by using the concordance correlation coefficient. In addition, Bland and Altman14 analysis was performed to assess the absolute degree of interrater differences across the entire ASPECTS scale. For dichotomized ASPECTS and for each ASPECTS region, Cohen κ coefficients were calculated. In addition, because κ scores are influenced by prevalence and bias, PABAK scores were calculated and reported with their associated prevalence and bias indices.15,16 Interpretation of the κ values followed the proposed standards of Landis and Koch: 0–.20 (slight); 0.21–0.40 (fair); 0.41–0.60 (moderate); 0.61–0.80 (substantial); and 0.81–1.00 (almost perfect).17 Continuous data with normal distribution were reported as mean ± standard deviation, ordinal or non-normal data were reported as median (IQR), and categoric data were reported as proportions. Normality was tested by using the Kolmogorov-Smirnov test. Statistical significance was taken at P < .05. Statistical analysis was performed by using MedCalc, version 11.2.1 (MedCalc Software, Mariakerke, Belgium).
Results
A total of 155 patients met study criteria. The cohort included 80 female patients, with an average age of 67.7 ± 17.2 years. Seventy strokes involved the right hemisphere. The median baseline NIHSS score was 17 (IQR 14–20). There were 140 isolated intracranial occlusions: 33 occlusions of the terminal ICA with or without extension into the M1 segment, 92 M1 MCA occlusions, and 15 M2 MCA occlusions. There were 15 tandem occlusions of the cervical carotid and intracranial ICA and/or MCA. The mean duration from stroke onset to NCCT imaging was 199 ± 130 minutes.
The distribution of consensus ASPECTS within the cohort was skewed toward higher ASPECTS (smaller infarcts), with 97 patients demonstrating ASPECTS >7 (Fig 2). The median baseline ASPECTS was 8 (IQR 7–9). There was substantial interrater agreement for total ASPECTS grading, with a concordance correlation coefficient of 0.77 (95% CI, 0.70–0.83). In the Bland-Altman analysis (Fig 3), there was a small difference between the mean ASPECTS (7.4 versus 7.6; P < .05). The limits of agreement (95% CI) for the ASPECTS score differences ranged from −2.8 to +2.4. ASPECTS was the same between readers in 34.2% (53/155), within 1 point in 76.8% (119/155), and within 2 points in 91.6% (142/155). There were no differences greater than 3 points.
For dichotomized baseline ASPECTS (≤7 versus >7), there was a 76.8% (119/155) observed rate of agreement. Accounting for chance, there was moderate agreement, with κ = 0.53 (PABAK = 0.54; bias index = 0.04; prevalence index = 0.11).
Fig 4 illustrates the prevalence of ischemic involvement for each ASPECTS region using the consensus reads. The basal ganglia and insula were the most frequently affected regions. The table provides the κ and PABAK values for the individual ASPECTS regions. Interrater agreement was variable, with the worst performance in the internal capsule and the cortical regions. The internal capsule and M6 region demonstrated only fair agreement. When accounting for bias and prevalence, agreement improved to at least substantial in all regions except the lentiform nucleus (moderate).
Discussion
Among highly experienced readers, there was substantial interobserver agreement for grading baseline ASPECTS on acute noncontrast head CT in the setting of anterior circulation proximal artery occlusion. Most differences in ASPECTS were within 1–2 points in this study. However, only moderate interrater agreement was achieved for assessing dichotomized ASPECTS > 7 versus ≤7. In approximately 25% of our cohort, treatment decisions for IAT based on dichotomized ASPECTS would have been affected by the interrater reliability.
Our findings with respect to total ASPECTS scoring support a previous study that also demonstrated substantial agreement (weighted κ = 0.69) across the entire ASPECTS scale.18 In that study, the authors found a mean interrater difference of zero, with a standard deviation of 1.1 points. Similarly, we found a mean difference of 0.2, with 76.8% of scores being within 1 point.
Previous studies have shown that baseline ASPECTS >7 identifies patients who are more likely to benefit from endovascular treatment.8⇓–10 However, based on our findings, the utility of this approach may be adversely affected by its limited interrater reliability. The literature is conflicting with respect to this question. The original ASPECTS article by Barber et al7 reported substantial to near-perfect agreement for ASPECTS dichotomized at 7, with κ values ranging from 0.71 to 0.89. However, another study by Mak et al11 demonstrated a κ score of 0.34 (fair), with a strikingly low 42% rate of observed agreement. When adjusting for prevalence and bias, agreement improved to moderate (PABAK = 0.44). Our results confirm only a moderate level of reliability (κ = 0.53, PABAK = 0.54) for dichotomized ASPECTS in the setting of proximal occlusions.
It is unclear exactly why the interrater agreement for dichotomized ASPECTS was so different between the study by Barber et al7 and ours. However, this difference is probably related to variable selection bias and consequently differing patient cohorts. Timing of imaging is 1 factor that can influence lesion detection. In their study, patients underwent baseline NCCT within 3 hours of stroke onset and were treated with intravenous alteplase, while in the present study, at least half of the patients were imaged beyond 3 hours (median 188.5 minutes, IQR 83–278 minutes) and were treated with IAT. However, the increased time to imaging in our study would be expected to yield more conspicuous lesions and hence a better agreement rather than a worse one. Another possible explanation for the varying agreement is a difference in ASPECTS distribution. Although the median baseline ASPECTS was 8 in both studies, a broader distribution (ie, larger variance) in 1 study would result in a smaller proportion of scores around the threshold of 7 and therefore fewer potential discrepancies. Unfortunately, Barber et al7 did not report the proportions of individual scores in their study. In our cohort, almost half (48.4% [75/155]) of the consensus scores were 7 or 8 and thus prone to interrater disagreement.
This last issue underscores the problem with using a binary classification scheme. Clearly, dichotomization will penalize scores that are discrepant by only 1 point if they are around the threshold value. Therefore, despite our excellent agreement for total ASPECTS, we achieved only moderate agreement for dichotomized ASPECTS because of the clustering of scores around 7. This limitation may particularly affect favorable IAT candidates (ie, those who present with minimal tissue injury). If a small infarct is present in the setting of a proximal occlusion, it typically involves the caudate nucleus, lentiform, and/or insula, which are end-artery territories. As such, the baseline ASPECTS is often 7 or 8 in these patients. Also in such patients, the moderate level of agreement that we observed for the lentiform nucleus may further contribute to the relatively low rate of agreement for dichotomized ASPECTS.
Our findings with respect to individual ASPECTS regions are novel, as no study, to our knowledge, has reported interobserver agreement by region. Notably, there was only fair to moderate agreement in the internal capsule and the cortical regions. However, this was mostly related to the low prevalence of ischemic involvement in our study. When accounting for prevalence and bias, these areas demonstrated substantial or near-perfect agreement, suggesting that there was strong consensus between the readers as to what was considered abnormal. It is somewhat surprising that the agreement for total ASPECTS was high despite the low level of agreement in these multiple regions. This suggests that overall agreement is determined largely by agreement within the more prevalent ASPECTS regions (eg, basal ganglia, insula).
This study is limited primarily by our highly selected patient population, all of whom underwent IAT. This explains why our ASPECTS distribution was skewed toward smaller infarcts, which are thought to be more likely to benefit from treatment. However, it is in this specific patient subset where interobserver agreement is most important. Moreover, for the analysis of dichotomized ASPECTS, and of the individual ASPECTS regions, we were able to mitigate this issue by adjusting for prevalence. Another limitation is that we did not assess intraobserver reliability. However, we believe that interobserver agreement is more relevant with respect to the clinical utility of ASPECTS. In order for NCCT ASPECTS to be widely adopted for selecting patients for IAT, it must have good reliability between readers at different medical centers. A further limitation is that our findings cannot be extended to general clinical practice, where stroke-imaging studies are often interpreted by physicians who are not expert at ASPECTS grading. Previous studies have shown that the accuracy of ASPECTS evaluation is dependent on the level of rater experience.7,19,20 However, we wanted to assess the reliability of ASPECTS under the most ideal conditions (ie, using readers with significant experience with stroke imaging and ASPECTS grading). In support of this last point, a recent study by Wardlaw et al21 found that the reliability of grading acute ischemia on NCCT (versus an expert neuroradiologist) was improved if the reader was a neuroradiologist and if the scan was acquired later in the 6-hour window. In our study, both readers were neuroradiologists and at least half of the scans were acquired after 3 hours. Future studies are needed to verify our findings in an unselected population of patients with AIS with proximal occlusions.
Conclusions
Among patients with anterior circulation proximal artery occlusions who are eligible for intra-arterial therapy, interrater reliability for ASPECTS grading is substantial across the entire ASPECTS scale. However, when using dichotomized ASPECTS (≤7 versus >7) for treatment selection, agreement is only moderate, which limits the utility of this approach. In our cohort, approximately 25% of treatment decisions would be affected by interrater reliability.
Footnotes
Paper previously presented at: Annual Meeting of the Radiological Society of North America, November 27–December 2, 2011; Chicago, Illinois.
Disclosures: Joshua Hirsch—UNRELATED: Consultancy: Phillips, CareFusion, Comments: Participated in focus group of NI specialists for NI suites of the future (Phillips); participated on NextGen team for product development (CareFusion); Royalties: CareFusion, Comments: as above; Stock/Stock Options: IntraTech, Nevro, NFocus, Comments: Development stage companies that are working on NI products (IntraTech and N-Focus) and nerve imaging (Nevro). Albert Yoo—UNRELATED: Grants/Grants Pending: Penumbra* * Money paid to institution
References
- Received September 7, 2011.
- Accepted after revision October 2, 2011.
- © 2012 by American Journal of Neuroradiology