Abstract
BACKGROUND AND PURPOSE: Despite availability of advanced imaging, distinguishing radiation necrosis from recurrent brain tumors noninvasively is a big challenge in neuro-oncology. Our aim was to determine the feasibility of radiomic (computer-extracted texture) features in differentiating radiation necrosis from recurrent brain tumors on routine MR imaging (gadolinium T1WI, T2WI, FLAIR).
MATERIALS AND METHODS: A retrospective study of brain tumor MR imaging performed 9 months (or later) post-radiochemotherapy was performed from 2 institutions. Fifty-eight patient studies were analyzed, consisting of a training (n = 43) cohort from one institution and an independent test (n = 15) cohort from another, with surgical histologic findings confirmed by an experienced neuropathologist at the respective institutions. Brain lesions on MR imaging were manually annotated by an expert neuroradiologist. A set of radiomic features was extracted for every lesion on each MR imaging sequence: gadolinium T1WI, T2WI, and FLAIR. Feature selection was used to identify the top 5 most discriminating features for every MR imaging sequence on the training cohort. These features were then evaluated on the test cohort by a support vector machine classifier. The classification performance was compared against diagnostic reads by 2 expert neuroradiologists who had access to the same MR imaging sequences (gadolinium T1WI, T2WI, and FLAIR) as the classifier.
RESULTS: On the training cohort, the area under the receiver operating characteristic curve was highest for FLAIR with 0.79; 95% CI, 0.77–0.81 for primary (n = 22); and 0.79, 95% CI, 0.75–0.83 for metastatic subgroups (n = 21). Of the 15 studies in the holdout cohort, the support vector machine classifier identified 12 of 15 studies correctly, while neuroradiologist 1 diagnosed 7 of 15 and neuroradiologist 2 diagnosed 8 of 15 studies correctly, respectively.
CONCLUSIONS: Our preliminary results suggest that radiomic features may provide complementary diagnostic information on routine MR imaging sequences that may improve the distinction of radiation necrosis from recurrence for both primary and metastatic brain tumors.
ABBREVIATIONS:
- AUC
- area under receiver operating characteristic curve
- Gd
- gadolinium
- mRmR
- minimum redundancy and maximum relevance
- RN
- radiation necrosis
- RT
- radiation therapy
- SVM
- support vector machine
Treatment of malignant brain tumors involves a combined approach of surgical resection, radiation therapy (RT), and, depending on the histology, chemotherapy. Cerebral radiation necrosis (RN) is often an unavoidable complication of high-dose focal RT that typically manifests 6–9 months post-RT and mimics the symptoms and MR imaging appearance of tumor recurrence, in both primary and metastatic brain tumor cases.1 RN and tumor recurrence have substantially different treatment regimens and need to be identified expediently for determining prognosis, guiding subsequent therapy, and improving patient outcome.
Standard MR imaging2⇓–4 remains the technique of choice for posttreatment evaluation of patients with brain tumor. The Response Assessment in Neuro-Oncology (RANO; http://radiopaedia.org/articles/rano-criteria-for-glioblastoma) criteria recommend using 2D measurements (diameter) of contrast enhancement on posttreatment gadolinium-enhanced (Gd) T1-weighted MR imaging (with respect to pretreatment MR imaging) as the fundamentally quantifiable imaging criteria for assessment of response to treatment. However, due to a similar appearance on follow-up posttreatment Gd-T1WI MR imaging, differentiating RN and tumor recurrence by using 2D measurements of contrast enhancement (as manually identified by an expert) is clinically extremely challenging.5 Recent studies have shown promise in using semiquantitative MR imaging measures such as apparent diffusion coefficient ratios6; choline, creatine, and N-acetylaspartate ratios from MR spectroscopy7; and perfusion imaging6 for differentiating RN from tumor recurrence. These techniques, however, may not be universally available, are often difficult to reproduce, and tend to increase the overall cost of the imaging examination. Hence, there is a need for identification of reliable noninvasive quantitative measurements on routinely acquired brain MR imaging (Gd-T1WI, T2WI, and fluid-attenuated inversion recovery) that can accurately distinguish RN from tumor recurrence.
The physiologic pathways leading to the development of RN and brain tumor recurrence are fundamentally different. Thus, there may be subtle variances in the morphologic appearance of the 2 conditions reflected as differences in the microarchitectural texture appearance embedded across Gd-T1WI, T2WI, and FLAIR, which might enable discrimination of RN from recurrent tumors.8 Radiomic or computer-extracted texture features allow capture of higher order quantitative measurements (eg, co-occurrence matrix homogeneity, neighboring gray-level dependence matrix, multiscale Gaussian derivatives) for modeling macro- and microscale morphologic attributes within the lesion area for every MR imaging protocol. Some of these radiomic features may not be visually appreciable by a radiologist but may complement their ability to make a more reliable diagnosis of the disease. Of late, there has been interest in the use of radiomic features computed from treatment-naïve MR imaging to distinguish patients with glioblastoma with long-term and short-term survival.9 Relatively little work, however, has focused on the use of radiomic analysis for distinguishing radiation necrosis from brain tumor recurrence on MR imaging.
The purpose of this study is to evaluate the feasibility of radiomic analysis on routine MR imaging sequences in identifying computer-extracted texture differences between RN and tumor recurrence that may not be visually appreciable on conventional MR imaging and to distinguish RN and recurrent cancer across primary and metastatic brain tumor studies. In this study, we identified a set of radiomic features that best distinguished RN from tumor recurrence on a training cohort across 3 routine multiparametric MR images (Gd-T1WI, T2WI, FLAIR). We then evaluated the validity of these radiomic features on a small holdout cohort and performed a head-to-head comparison of their performance against independent diagnostic reads by 2 expert neuroradiologists who were presented with the same routine MR imaging sequences as the classifier (Gd-T1WI, T2WI, and FLAIR). The ultimate goal of this work was to develop noninvasive techniques that can be used in conjunction with routine MR imaging protocols to complement a radiologist's diagnosis of RN versus tumor recurrence for improving patient management both for primary and metastatic brain tumors.
Materials and Methods
Study Population
The study population consisted of independent training and test cohorts obtained from the local (University Hospitals Case Medical Center) and the collaborating (University of Texas Southwestern Medical Center) institution and acquired for this institutional review board–approved and Health Insurance Portability and Accountability Act–compliant study. The 2 patient cohorts were identified by performing a retrospective review of neuropathology in all patients with brain tumor who underwent an operation for a recurrent or progressive Gd T1WI-enhancing lesion identified during follow-up at 9 months (or later) after the initial brain RT. Follow-up MR imaging scans within 0–21 days before the second resection or biopsy (for disease confirmation) were used for analysis. Inclusion criteria were that the pathology specimen be obtained by resection (preferably) or by multiple biopsies (>2) via stereotactic guidance. Single biopsies were not allowed because of the potential for sampling error. Histology was rereviewed by a neuropathologist (M.C. at the local and K.J.H. at the collaborating institution) blinded to the original diagnosis and type of RT, to quantify the percentage of RN and recurrent tumors.
To avoid any training errors due to “mixed” pathologies on the same lesion, for the training cohort, we strictly defined the presence of RN as ≥80% RN and of recurrent tumor as ≥80% recurrent tumor (other “mixed” cases with varying proportions of RN and tumor recurrence were excluded). We identified 43 cases at the local institution from 2006 to 2014 that followed this strict inclusion criterion, consisting of 22 primary tumors (12 with recurrent tumor, 10 with RN) and 21 metastatic tumors (12 with recurrent tumor, 9 with RN). The test cohort consisted of 15 studies, 11 primary and 4 metastatic cases of patients who underwent an operation between 2009 and 2015 and were pathologically confirmed with either “predominant” or a mixture of RN and tumor recurrence in varying proportions on the same lesion as confirmed by pathology. Gd-T1WI scans were available for 10 of 15 studies; T2WI scans, for 7 of 15 studies; and FLAIR, for all 15 studies respectively. Table 1 shows the summary of the study population. The details on MR imaging protocol acquisition and preprocessing steps are provided in the On-line Appendix.
Tumor Delineation and Segmentation
The ROI containing the lesion was manually segmented across contiguous sections on each Gd-T1WI, T2WI, and FLAIR sequence by an experienced radiologist (L.W.) with a hand-annotation tool in 3D Slicer (http://www.slicer.org). To assess the variability in features due to segmentation, we also segmented the ROI containing the lesion for every image with an automated brain tumor segmentation tool, BraTumIA (http://istb-software.unibe.ch/bratumia/MIA/BraTumIA.html).10 The degree of overlap, computed as Dice Index, across manual and automated segmentation was recorded, and the variability of texture features across manual and automated segmentation was reported as boxplots (On-line Fig 1).
Radiomic Texture Features
A total of 119 2D radiomic texture features on a per-voxel basis were extracted from every expert-annotated lesion on contiguous sections of a patient study. For every patient study, a median feature value was calculated from the feature responses of all voxels from across all sections associated with each annotated lesion. All feature calculations were performed by using in-house software implemented in the Matlab R 2014b platform (MathWorks, Natick, Massachusetts). A total of 13 Haralick, 25 Laws, 24 Laplacian pyramid features, and 20 Histogram of Gradient orientations features were computed in Matlab. A detailed description of a few representative radiomic features is provided in On-line Table 1.
Feature Selection and Classification on the Training Cohort
To identify the most discriminating radiomic features across each of the MR imaging sequences on the training cohort, we used minimum redundancy and maximum relevance (mRmR)11 feature-selection analysis in a sequential feed-forward fashion by using a Matlab R 2014b platform.12 The sequential feed-forward algorithm is a bottom-up search approach, which starts from an empty feature set and gradually adds features selected via mRmR so that redundant features are removed while maximizing discrimination between the 2 classes (RN and tumor recurrence). Feed-forward mRmR feature selection was used in combination with a support vector machine (SVM) classifier,13 and the performance metric was an area under the receiver operating characteristic curve (AUC). In our setup, we chose the top 5 most discriminative features from mRmR for each classification task (RN versus tumor recurrence in primary and metastatic subgroups, respectively). Inclusion of >5 features did not improve the AUC of the classifier within the training set. Hence, we limited inclusion of features to the classifier to just the top 5. To mitigate selection and classifier training bias, we used a 3-fold (1-fold held out for testing), patient-stratified, cross-validation scheme, which was repeated 100 times. The best 5 features were identified as the ones that most frequently appeared in the set of the top 5 most discriminative features across 100 runs of 3-fold cross-validation. The analysis was performed independently for every MR imaging sequence.
Classification on the Test Cohort
The independent test cohort was evaluated within an SVM classifier by using the top 5 most discriminatory features identified during training, with evaluation being performed separately for the primary and metastatic subgroups. The radiomics classifier output was obtained as a binary output in which an output of 1 represented tumor recurrence and zero represented RN for each of the studies in the test cohort. In cases in which >1 MR imaging sequence was available, a consensus (agreement of outcome in 2 of 2 or 2 of 3 sequences) across the binary classifier outputs was used as the final outcome of the radiomics classifier. For cases in which 2 sequences showed disagreement in diagnosis, the output of the sequence, identified as the most discriminating within the training cohort, was used to make the final decision. The final radiomics classifier output for every patient study was compared against the histopathologic findings to report the accuracy values.
Statistical Analysis
Statistical analysis was performed by using a nonparametric Wilcoxon signed ranked test while comparing the differences in feature values across different feature sets for every MR imaging sequence for both the primary and metastatic cohorts independently. To further make the statistical significance test more stringent, the P value was appropriately adjusted at P = .00125 with a Bonferroni test to account for type I errors. All statistical analyses were implemented in the Matlab R2014b platform (MathWorks). All reported confidence intervals are over 95% confidence intervals.
Comparative Multireader Study
Two board-certified neuroradiologists with 2 years of experience (A.P.N., A.G), blinded to the pathology reports, read the MR images (Gd-T1WI, T2WI, and FLAIR, as available) to diagnose the presence of RN or tumor recurrence on each of the holdout studies. The same sequence scans exposed to the SVM classifier were provided to the expert readers to avoid any comparison bias. Neither the machine-learning classifier nor the readers had access to the baseline, pretreatment scan. The only additional information to which the expert readers had access was the type of tumor (ie, oligodendroglioma, glioblastoma) for the primary brain tumor cohort and the location of primary disease for the metastatic brain tumor cohort. No additional clinical information (ie, age, sex, Karnofsky performance score) was provided. The readers were allowed to go back to the scans multiple times as required to make their final diagnosis. Both readers independently assigned a probability score (between 0.5 and 1 in increments of 0.1) for every study as belonging to either RN or tumor recurrence based on the confidence in their diagnostic call. A confidence of 0.5 denotes that the expert was uncertain of the diagnosis while a confidence of 1 denotes that the expert was completely confident in his or her diagnosis of RN or tumor recurrence. The probability scores and the resulting diagnosis for every threshold value for every study for the 2 experts are provided in On-line Table 3.
Results
RN versus Recurrent Tumors in the Primary Brain Tumor Subgroup
Feature Discovery and Classification on the Training Cohort.
Tumor delineation showed excellent agreement between the manual and automated segmentation (Dice Index range, 0.8–0.9). Differences in tumor volume and age across the 2 conditions were found to be statistically insignificant. The top radiomic features obtained by using mRmR for the primary brain tumor cohort are shown in On-line Table 2 and qualitatively represented in Fig 1. The average feature values for all 3 of the most discriminative features in the primary cohort were found to be statistically significantly different across RN and tumor recurrence (P < .001). Correlation, energy, and Laws features (combination of level and edge filters [L5E5]) in the lower Laplacian scale space were consistently identified as the most discriminative ones in distinguishing the 2 classes across all 3 sequences.
The best performing feature sets in distinguishing RN and tumor recurrence were obtained for FLAIR, with reported AUC and accuracy values of 0.79 ± 0.05; 95% confidence interval, 0.77–0.81; and 0.75 ± 0.05; 95% CI, 0.73–0.77. This was followed by the feature set for T2WI with reported AUC and accuracy values of 0.77 ± 0.06; 95% CI, 0.74–0.80; and 0.72 ± 0.08; 95% CI, 0.69–0.75, respectively. Notably, Gd-T1-weighted MR imaging, a routinely used sequence in the clinic for response assessment, was ranked lowest in terms of accuracy (0.57 ± 0.07) and AUC (0.57 ± 0.08) across all 3 sequences for the primary brain tumor cohort.
Classification of the Independent Test Cohort.
The accuracy of the classifier and the 2 expert readers for the holdout studies, both for primary and metastatic cases, are reported in Table 2. On-line Table 3 shows the results of the SVM classifier obtained on the test cohort by using the top 5 features identified via the mRmR feature-selection method. On the basis of the output of the classifier trained either on the FLAIR sequence alone (identified as best performing sequence on the training cohort) or by using the highest consensus (agreement in 2 of 2 or 2 of 3 sequences), 10 of 11 studies were correctly classified. For a threshold of >0.5 in their confidence scores, radiologist 1 diagnosed 4 of 11 cases correctly and was unsure about 1 (study 10) (confidence score = 0.5), while radiologist 2 correctly identified 6 of the 11 cases. The overall accuracy for radiologist 1 was 54%, while for radiologist 2, it was 36%. One of the patient studies (study 7) comprised a “mixed” distribution of pathologies with 75% of RN and 25% of the tissue composing tumor recurrence. However, the patient was clinically treated as a case of recurrence leading to surgical resection; hence, the final diagnosis was a recurrence. Study 4 on the pathology report was identified as having equal (30%) proportions of RN and tumor recurrence and was identified as having recurrence by the SVM classifier and the 2 expert readers.
RN versus Recurrent Tumors in Metastatic Brain Tumor Subgroup
Feature Discovery and Classification on the Training Cohort.
The top-ranked features in distinguishing RN and tumor recurrence as obtained from the mRmR feature-selection experiment for the metastatic brain tumor cohort are shown in On-line Tables 2 and 3. Difference variance (in the Laplacian pyramid domain), sum average (in Laplacian pyramid domain), correlation, and correlation (in Laplacian pyramid domain), along with Laws features, were consistently identified as key discriminative features across the 3 sequences. Several features were consistently identified as discriminative of RN and tumor recurrence across the 3 MR imaging sequences in both the primary and metastatic brain tumor cohorts. Unlike the primary brain tumor cohort, the P values for the 3 most discriminating feature sets obtained for the metastatic cohort were not found to be statistically significantly different between RN and tumor recurrence.
The 5 most discriminating features (identified via mRmR on the training cohort) for Gd-T1WI, T2WI, and FLAIR for the metastatic brain tumor cohort are listed in On-line Table 2. Similar to the primary brain tumor cohort, the most discriminative feature set was obtained for FLAIR with AUC and accuracy values of 0.79 ± 0.09; 95% CI, 0.75–0.83; and 0.75 ± 0.06; 95% CI, 0.72–0.78, respectively. This was followed by features extracted from the Gd-T1 sequence, which had AUC and accuracy values of 0.69 ± 0.08; 95% CI, 0.66–0.72; and 0.64 ± 0.07; 95% CI, 0.61–0.67, respectively. While Gd-T1 MR imaging was ranked lowest in terms of accuracy and AUC for the primary brain tumor cohort, T2-weighted MR imaging was identified as the sequence with the lowest accuracy and AUC in distinguishing the 2 classes in the metastatic brain tumor subgroup.
Classification of the Independent Holdout Cohort.
As shown in On-line Table 3, 2 of 4 metastatic brain tumor recurrence cases (cases 2 and 3) in the holdout cohort were correctly identified by the SVM classifier (Table 2). Both neuroradiologists correctly and consistently identified 2 of the 4 cases (cases 1 and 3). Case 4 was incorrectly classified by the SVM classifier and the 2 expert readers.
Discussion
Differentiating RN from recurrent brain tumors is one of the most challenging clinical dilemmas in neuro-oncology due to the similar appearance of the 2 conditions on standard MR imaging. In this study, we investigated the feasibility of computerized texture features in distinguishing RN and tumor recurrence on Gd-T1WI, T2WI, and FLAIR across primary and metastatic brain tumor subgroups in a limited cohort of studies obtained from 2 different institutions.
A strength of this study is that the definition of RN versus recurrent tumor for our training cohort was stricter than that in many previously reported imaging studies, which have defined RN and recurrence on the basis of the suspicion of the disease on follow-up MRIs (with no histopathology confirmation).6,14 To evaluate the performance of our classifier across routinely seen clinical MR imaging studies, we allowed both mixed and predominant RN/tumor recurrence cases to be included within our holdout test cohort obtained from another institution (University of Texas Medical Center). The holdout test cohort was also independently analyzed by 2 expert readers who were blinded to the pathology reports and clinical findings. The visual traits for tumor recurrence that the 2 expert neuro-radiologists took into account while making the diagnosis included expansile lesion; solid, nodular, or ringlike well-defined enhancements; and internal hemorrhages (in primary tumors). Similarly, the experts identified feathery, geographic, and incomplete enhancements usually associated with predominantly radiation-induced effects.
Our study identified Laplacian pyramid texture features as being discriminative, possibly because this class of features emphasizes edge-related differences between RT and RN at lower resolutions. Similar to the visual features reported by our expert readers, Reddy et al15 have previously reported meshlike diffuse enhancement and rim enhancement with feathery indistinct margins as characteristic of RN. Similarly, tumor recurrence is reported to have focal solid nodules and solid uniform enhancement with distinct margins.15 Similarly, Laws features, which enable capture of a combination of different edge, level, and spot patterns within the lesion, were identified as being discriminative possibly because they implicitly model the so-called soap bubble and Swiss cheese patterns that have previously been suggested as associated with RN on Gd-T1WI. Additionally, it has been suggested that RN is associated with a diffuse pattern characterized by periventricular white matter changes,2 while tumor recurrence has been suggested to be associated with hyper-/hypointensities indicative of hemorrhagic changes on Gd-T1WI, T2WI, and FLAIR MR imaging. Haralick texture features modeled on co-occurring intensity patterns and higher order image derivatives may be capturing these hemorrhagic changes on FLAIR and T2WI sequences.
Of the 3 MR imaging sequences (Gd-T1WI, T2WI, and FLAIR), FLAIR was identified as the most discriminative in the training cohort in terms of AUC and accuracy for both primary and metastatic brain tumor cohorts. FLAIR is highly sensitive, but not specific, for identifying coexisting tumor and edema.16 In cases of invasive primary brain tumors, malignant tumor cells have been found up to 4 cm away from contrast-enhancing regions,17 with >90% of the cases of recurrence occurring close to the tumor margin. Future studies could address the role of radiomic features obtained from peritumoral edema as complementary measurements to further improve the diagnosis of RN from tumor recurrence noninvasively.
A recent study14 used Haralick and wavelet texture features on Gd-T1-weighted MR imaging to distinguish RN from metastatic brain tumor recurrence with a reported AUC of 94%. However, we believe that the results, reported on a per-section basis, may have been affected by the classifier being contaminated by sections from the same patient being used both in the training and testing sets during classification. Additionally, in most cases, the clinical diagnosis was assessed by clinical and radiologic follow-up (as opposed to a more reliable histologic confirmation).
Our study did have its limitations. As a feasibility study, the reported results are preliminary because our study was limited by a relatively small sample size, both for the training and holdout cohorts. However, to the extent possible, a rigorous statistical analysis was performed to evaluate the classification results. Although the different image preprocessing steps performed did not explicitly account for varying signal-to-noise ratios due to different magnetic fields, this effect was largely mitigated because all the computerized image-based features were derived from cumulative statistics (median) of many pixels. The comparison between expert readers and the radiomics classifier was kept unbiased to the extent possible by ensuring that the expert readers were provided the same routine scans that were available to the classifier. However, the 2 expert readers who performed the analysis had only 2 years of experience as board-certified neuroradiologists. Additionally, the readers did not have access to all 3 sequences in some cases (due to those sequences not being available). While we attempted to control for heterogeneity in patient studies by separately assessing primary and metastatic tumor cohorts, a larger dataset is required to identify the influence of other variables on feature selection, such as treatment type and dose.
Conclusions
In this feasibility study, we investigated the role of texture features in distinguishing radiation necrosis from recurrent brain tumors on Gd-T1WI, T2WI, and FLAIR sequences obtained from 2 different sites, across 2 subgroups of studies, primary and metastatic brain tumors. Our results suggest that radiomic analysis on routinely acquired MR imaging might enable discrimination of RN and tumor recurrence both for primary and metastatic brain tumors. Future work will focus on exploring the added value of texture features along with the diagnostic reads from expert radiologists as a part of a prospective clinical study. We will also prospectively validate the features identified in this study on a larger, multi-institutional cohort, in the context of differentiating both pseudoprogression and radiation necrosis from tumor recurrence.
Footnotes
Disclosures: Lee Wolansky—UNRELATED: Board Membership: Immunocellular Therapeutics, Comments: Data Monitoring Committee; Consultancy: BioClinica,* Comments: consultant on brain tumor studies. Anant Madabhushi—UNRELATED: Board Membership: Inspirata; Consultancy: Inspirata; Grants/Grants Pending: Inspirata*; Patents (planned, pending or issued): Inspirata,* Elucid Bioimaging*; Royalties: Inspirata, Elucid Bioimaging; Stock/Stock Options: Inspirata, Elucid Bioimaging. *Money paid to the institution.
Research reported in this publication was supported by the Coulter Translational Award; Ohio Third Frontier Technology Validation grant; National Science Foundation Innovation Corps; National Cancer Institute of the National Institutes of Health under award numbers R01CA136535-01, R01CA140772-01, and R21CA167811-01; National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under award number R43EB015199-01; National Science Foundation under award number IIP-1248316; Clinical and Translational Science Collaborative Award UL1TR 000439; and SOURCE Summer Research Program and Case Western Alumni Association.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Indicates open access to non-subscribers at www.ajnr.org
References
- Received April 12, 2016.
- Accepted after revision July 16, 2016.
- © 2016 by American Journal of Neuroradiology