Abstract
BACKGROUND AND PURPOSE: Cerebral aneurysms yield the risk of rupture, severe disability and death. Thus, early detection of cerebral aneurysms is crucial to ensure timely treatment, if necessary. AI-based software tools are expected to enhance radiologists' performance in detecting pathologies like cerebral aneurysms in the future. Our aim was to evaluate the diagnostic performance of an artificial intelligence–based software designed to detect intracranial aneurysms on TOF-MRA.
MATERIALS AND METHODS: One hundred ninety-one MR imaging data sets were analyzed using the software mdbrain for the presence of intracranial aneurysms on TOF-MRA obtained using two 3T MR imaging scanners or a 1.5T MR imaging scanner according to our clinical standard protocol. The results were compared with the reading of an experienced radiologist as a criterion standard to measure the sensitivity, specificity, positive and negative predictive values, and accuracy of the software. Additionally, detection rates depending on size, morphology, and location of the aneurysms were evaluated.
RESULTS: Fifty-four aneurysms were detected by the expert reader. The overall sensitivity of the software for the detection of cerebral aneurysms was 72.6%, the specificity was 87.2%, and the accuracy was 82.6%. The positive predictive value was 67.9%, and the negative predictive value was 88.5%. We observed a sensitivity of 100% for saccular aneurysms of >5 mm without signs of thrombosis and low detection rates for fusiform or thrombosed aneurysms of 33.3% and 16.7%, respectively. Of 8 aneurysms that were not included in the initial written reports but were detected by the expert reader, retrospectively, 4 were detected by the software.
CONCLUSIONS: Our data suggest that the software can assist radiologists in reporting TOF-MRA. The software was highly reliable in detecting saccular aneurysms, while for fusiform or thrombosed aneurysms, further improvements are needed. Further studies are necessary to investigate the impact of the software on detection rates, interrater reliability, and reading times.
ABBREVIATION:
- AI
- artificial intelligence
The prevalence of intracranial aneurysms has been estimated to be up to 2% of the population. They account for up to 85% of nontraumatic SAHs, potentially leading to severe disability and death.1 MR imaging and CT have been shown to be reliable tools for the detection of intracranial aneurysms with accuracies of up to 90%.2 Due to the workload of radiologists increasing during the past years,3 innovative tools are needed to reduce the radiologist’s workload while maintaining or even improving the quality of patient care.
There have been early attempts to introduce conventional computer-aided diagnosis for the detection of intracranial aneurysms with sensitivities of 80% and 95%, respectively, but with the need to accept high rates of false-positive findings.4 Computer-aided diagnosis without the use of modern machine learning algorithms has also been shown to improve the diagnostic performance of radiologists in terms of the detection of cerebral aneurysms by TOF-MRA.5,6 More recently, research has shifted toward more advanced technologies using deep learning algorithms that showed promising results in the detection of intracranial aneurysms using both CTA7⇓-9 and TOF-MRA.10⇓⇓⇓⇓⇓-16 In addition, it has been shown that the use of deep learning software solutions could increase the reader’s performance in terms of the detection of aneurysms by TOF-MRA.17
Mdbrain (mediaire) is a CE-marked, commercially available software solution that has been designed to assist radiologists when reporting MR imaging of the brain. The authors of the current study had an early version of an add-on to mdbrain designed to automatically detect intracranial aneurysms on TOF-MRA images at their disposal and performed an independent, external validation of its diagnostic performance. We hypothesized that the software can assist reading radiologists in the detection of intracranial aneurysms. Therefore, we created a diverse study sample with a large variety of aneurysm sizes, locations, and morphologies acquired on different clinical MR imaging scanners at 3T and 1.5T to test the diagnostic performance and the generalizability of the software.
MATERIALS AND METHODS
Institutional review board approval was obtained for this retrospective study, and the need for written informed consent was waived.
The data set consisted of a total of 209 MR imaging studies with TOF-MRA obtained between March 2018 and January 2022. Most studies were consecutive cases from our PACS system, but the data set was enriched with cases with known aneurysms. Eighteen imaging studies were excluded due to poor image quality deemed nondiagnostic by the expert reader or due to major competing pathologies, mainly major cerebral hemorrhage suspected of creating false-positive results.
The MR imaging studies included were retrospectively reviewed by an experienced radiologist (with 6 years of experience in interpreting MR imaging of the brain) for the presence, localization, size, and configuration of intracranial aneurysms under full consideration of the patients’ clinical records, previous or subsequent imaging studies including DSA, and the respective written reports. For aneurysm size, the largest diameter of the aneurysm was measured using multiplanar reconstructions. This expert reading served as the diagnostic reference standard.
The images were obtained using either 2 clinical 3T MR imaging scanners (Achieva, Philips Healthcare; Discovery, GE Healthcare) or a 1.5T MR imaging scanner (Achieva). The patients were placed in the supine position. The axial 3D TOF sequences were acquired according to the routine clinical protocol used at our institution (TR = 19.33–20.12 ms; TE = 3.68–3.80 ms; section thickness = 1 mm; increment = 0.5 mm). FOV and matrix size were chosen according to the patient’s characteristics by the radiology technician. The studies were anonymized and processed by the artificial intelligence (AI)-based software solution mdbrain, Version 4. Along with a reconstructed TOF sequence highlighting the detected aneurysms in color with a bounding box, quantitative reports were sent to the PACS system. Those reports displayed representative images highlighting the largest aneurysm detected and quantitative measures of the size of each detected aneurysm (volume in microliters and diameter in millimeters). The sensitivity and specificity of the software could not be adjusted by the authors.
The underlying segmentation algorithm is based on a 3D convolutional neural network with a U-NET architecture.18 The model was trained on >100 brain MR imaging data sets of both healthy subjects and subjects with saccular cerebral aneurysms, obtained on a variety of Philips scanners, at 1T, 1.5T, and 3T, respectively. For each subject, the data consisted of a TOF-MRA scan as well as a corresponding binary mask of the unruptured aneurysms, as segmented by an expert radiologist. The training data set contained a total of 93 saccular aneurysms; 4 (4.3%) showed signs of partial thrombosis. The aneurysms were localized as follows: anterior communicating artery, 17%; A2 segment of the anterior cerebral artery, 10%; C6 segment of the ICA, 20%; C7 segment of the ICA, 22%; M1 or M2 segment of the MCA, 20%; and basilar artery, 9%. No fusiform aneurysms were included in the training process. Before training, all data were resampled to a fixed spacing before the intensity was normalized per image for zero mean and unit variance. During training of the neural network (using stochastic gradient descent), the input was provided in batches of patches, in which it was ensured that some patches contained aneurysm voxels and others did not. Augmentation was performed on the fly during training on the input patches to increase the generalization ability of the neural network. Mdbrain was purchased by the Department of Neuroradiology, University Hospital Bonn, at reduced cost. The authors had full control of the data and the information submitted for publication.
Statistical analyses were performed with R statistical and computing software, Version 4.0.3 (http://www.r-project.org/) and R Studio, Version 1.2.5033 (http://rstudio.org/download/desktop) using the caret package.19 The diagnostic performance of the AI software was compared with the radiologist’s findings using confusion matrices. We calculated the overall sensitivity, specificity, positive predictive value, negative predictive value, and accuracy as well as for specific subgroups, like different aneurysm sizes, aneurysm localization (including extradural versus intradural in the anterior circulation), saccular and fusiform aneurysms, and aneurysms that showed signs of thrombosis or inhomogeneous signal intensity. The Mann-Whitney U test was used to determine statistical significance. Aneurysms that were detected by the radiologist and also by the software were defined as true-positive; those that were detected by the software but not by the radiologist were defined as false-positive. When there were no aneurysms reported by the software or the radiologist, the case was defined as true-negative. Each aneurysm that was missed by the software but detected by the radiologist was counted as a false-negative.
RESULTS
Our study sample consisted of 191 subjects with 54 aneurysms in total. One hundred nine (57.1%) subjects were women, and 82 patients were men. The mean age was 58.2 years (median, 62 years; range, 18–95 years). One hundred thirty-seven (71.7%) patients were scanned at 3T, 54 patients were scanned 1.5T. Forty-seven patients (24.6%) had at least 1 aneurysm by TOF-MRA, and 11 patients (5.8%) had >1 aneurysm. Twenty-eight (48.3%) aneurysms detected by the radiologist were angiographically proved. One aneurysm had a history of rupture with SAH. Fifty-one (94.4%) were saccular aneurysms, while the remaining 3 were classified as fusiform aneurysms. Six (11.1%) aneurysms showed signs of partial thrombosis. The mean largest diameter of the detected aneurysms was 7.3 mm (median, 4.1 mm; range, 1.2–45.4 mm). In a subgroup analysis, we also analyzed the diameters of saccular aneurysms without any sign of thrombosis, with a mean largest diameter of 4.3 mm (median, 3.9 mm; range, 1.3–10 mm). Thirty-nine (72.2%) aneurysms were located in the anterior circulation, while the remaining 15 were located in the posterior circulation. Forty-six (85.2%) of the 54 aneurysms were correctly reported in the initial written report, while the remaining 8 aneurysms were retrospectively found by the expert reader or were revealed by DSA and could be seen retrospectively on the initial MR imaging scan.
The cases finally included in our study could all be processed by the software. The overall diagnostic performance is summarized in the Online Supplemental Data. Examples of accurately and inaccurately reported findings are shown in Figs 1–3. The software detected a total of 56 aneurysms, of which 38 were true-positive findings; the remaining 18 were false-positive findings (0.1 false-positive/case). One hundred twenty-three studies were correctly classified as negative by the software, while 16 aneurysms found by the expert reader were missed by the software and declared false-negatives. The overall accuracy of the software was 82.6%, with a sensitivity of 70.4%, a specificity of 87.2%, a positive predictive value of 67.9%, and a negative predictive value of 88.5%. Three aneurysms were declared fusiform aneurysms by the expert reader, of which 2 (66.7%) were not detected by the software. In addition, 6 aneurysms showed signs of partial thrombosis, of which only 1 (16.7%) was correctly detected by the software, while the remaining 5 (83.3%) were not detected by the software. The remaining 11 aneurysms that were not detected by the software were saccular aneurysms with no signs of thrombosis. The mean largest diameter of the saccular aneurysms with no signs of thrombosis missed by the software was 2.2 mm (median, 2.3 mm; range, 1.3–4.3 mm).
There was a statistically significant difference between saccular aneurysms with no signs of thrombosis detected by the software and those that were not detected in terms of largest diameter (P = .04). Seven of the 11 aneurysms missed by the software were located at the C4 or C5 level of the ICA; the remaining aneurysms missed by the software were located at the anterior communicating artery (n = 1), posterior communicating artery (n = 1), basilar artery (basilar tip excluded, n = 1), and the superior cerebellar artery (n = 1). We observed that 50% of the aneurysms located at the ICA, levels C1–C4, were correctly diagnosed by the software. For supraophthalmic aneurysms in the anterior circulation, the sensitivity was 77.8%, with an accuracy of 85.7%. A detailed overview of the different localizations of the aneurysms and the detection rates of the algorithm is shown in the Table. For saccular aneurysms with diameters of ≥5 mm and no signs of thrombosis or inhomogeneous signal intensity, the sensitivity, specificity, and accuracy rose up to 100%, 87.2%, and 88.0%, respectively. Four of the 8 aneurysms initially missed in the original reports were correctly detected by the software.
DISCUSSION
This single-center study compared the diagnostic performance of an AI-based software trained on TOF-MRI studies to detect cerebral aneurysms with an expert radiologist’s reading of 191 TOF-MRI studies.
Our goal was to test the software performance with a data set covering the variety seen in routine clinical care. In a patient cohort with a large range of ages, aneurysm sizes, configurations, and localizations examined with scanners of 2 different vendors at different field strengths, the software solution showed an overall accuracy of 82.6%, with a sensitivity and specificity of 70.4% and 87.2%, respectively. Our data suggest that the software can help the reading radiologist in detecting aneurysms when reporting TOF-MRI studies. Eight aneurysms found by the expert reader had not been reported in the initial, written reports. Four of these aneurysms were correctly detected by the software and thus would not have been missed if the software had been used in the clinical practice.
Other investigators have already worked on software solutions to automatically detect aneurysms in TOF-MRI studies. Sichtermann et al13 achieved sensitivities as high as 90% with their convolutional neural network–based approach, but with 6.1 false-positives per case. When Sichtermann et al13 shifted toward more acceptable false-positive rates of 0.8 per case, the sensitivity decreased to 79%. Ueda et al11 achieved sensitivities of 93% in their test data set, which was acquired at 4 different institutions, but they reported no specificity, only that their focus was not to miss aneurysms because their algorithm was intended to assist radiologists in not missing cerebral aneurysms. Stember et al12 reported a sensitivity of 98.8% for the detection of cerebral aneurysms with only 1 of 86 aneurysms missed by their algorithm. However, they used only MIP images of TOF-MRA, while the aforementioned studies all used source images of 3D TOF-MRAs. In addition, they excluded aneurysms of <3 mm. Nakao et al10 reported a sensitivity of 94.2%, but with a high false-positive rate of 2.9 per case. At a sensitivity of 70%, they reported 0.26 false-positives per case. Like Stember et al, they used MIP images of TOF-MRAs for training, validation, and testing of their algorithm. Terasaki et al14 achieved a sensitivity of up to 89.1% with a rate of 4.2 false-positives per case. Chen et al15 reported a sensitivity of 82.1% with a false-positive rate of 0.86 per case. Claux et al16 reported a sensitivity of 78% with a rate of 0.5 false-positives per case.
While the sensitivities we report seem relatively lower compared with the aforementioned studies, there are some differences: First, we performed an external validation, a crucial step in the validation process of algorithms designed to assist radiologists in avoiding overfitting to the test data set and to prove the generalizability of the software solution.20 Our data set was acquired on 3 different clinical scanners at an entirely different institution than the one where the data set was used to train, validate, and test the software solution we report. In contrast, the aforementioned studies all reported the diagnostic performance of their algorithms on their test data sets that were acquired at the same institutions as the data sets used for training and validation, though Ueda et al11 and Terasaki et al14 tried to avoid overfitting by using images from 4 different institutions. To the best of our knowledge, there have been no studies published on the diagnostic performance of other commercially available AI-tools addressing the automated detection of brain aneurysms by TOF-MRA that we could use to compare our results.
Second, we included not only MR imaging studies with aneurysms in our data set but also a high number of studies with negative findings showing no aneurysms, trying to get a more realistic collective to learn how far the algorithm is able to reliably rule out the presence of aneurysms in studies that have been read as having normal findings by the expert reader. However, our data set still does not reflect reality because we enriched the collective with patients who had aneurysms, allowing us to further investigate different localizations, sizes, and configurations of the aneurysms detected. Third, the studies mentioned above reported high sensitivities but, in the case of Nakao et al10 and Sichtermann et al13, also a high rate of false-positive findings, with the sensitivities decreasing to 79% and 70%, respectively,10,13 when reducing the rate of false-positives, values that are comparable with our findings because we found a rate of only 0.1 false-positive finding per case. Because AI software solutions like mdbrain will likely not only be used on high-risk populations but will also be available for every examination acquired with no regard for the risk constellation, we regard a low rate of false-positive findings as highly important, mainly to reduce the risk of unnecessary follow-up examinations but also to actually reduce the workload of the radiologists using the software. Fourth, 2 of the studies used MIP images instead of source images of 3D TOF-MRA; thus, the comparability with our study is limited.
In a second step, we further evaluated different subgroups of aneurysms to further investigate the performance of the software. Most interesting, fusiform aneurysms were detected in only 1 of 3 cases, probably because the software has been trained only on saccular aneurysms and the current version is not recommended for use on fusiform aneurysms. Also, aneurysms that showed signs of thrombosis or inhomogeneous signal intensity by TOF-MRA were detected in only 1 of 6 cases, a phenomenon that has similarly been reported by Ueda et al.11 We suspect that the low detection rate in this subgroup is due to the inhomogeneous signal within the aneurysms, making it more difficult for the algorithm to correctly segment the vessel and the aneurysm to their full extent. Also, the training data set contained only 4 cases of aneurysms with signs of partial thrombosis. Our findings may motivate further optimization of AI-based aneurysm detection for such cases.
While 5 aneurysms that were missed by the algorithm were either fusiform, thrombosed, or both, the remaining 11 aneurysms that were missed by the software were saccular aneurysms with regular signal intensity. Their mean diameter was 2.2 mm, compared with an overall mean diameter of 7.3 mm, with a statistically significant difference for aneurysm size in saccular aneurysms with no signs of thrombosis, so we may conclude that besides fusiform or thrombosed aneurysms, small aneurysms cannot be reliably excluded using the software. Here, one must also take into account that even an experienced reader can misinterpret infundibular artery origins or inhomogeneous flow signal as small aneurysms when no DSA data are available. In contrast, no saccular aneurysms with diameters of ≥5 mm with no signs of thrombosis were missed by the software, highlighting its potential use for clinically relevant findings that should not be missed.
As a third step, we investigated the diagnostic performance depending on the location of the aneurysms. Due to the low number of cases, we can only describe our findings: For infraophthalmic aneurysms in the anterior circulation, we found a sensitivity of only 50%. We hypothesize that the curvature of the vessel as well as the inhomogeneous signal intensity that can be observed in these regions account for this low detection rate. Also, there were no infraophthalmic ICA aneurysms included in the training data set, very likely leading to this comparably poor result. For supraophthalmic aneurysms in the anterior circulation, we found a higher sensitivity of 77.8% for this clinically more relevant subgroup because supraophthalmic aneurysms are at risk of causing SAH, while infraophthalmic aneurysms are not due to their extradural localization.21
Our study had some limitations. First, its retrospective nature, all MR imaging studies being acquired at the same institution, both the training data set and most of our MR imaging studies being acquired on MR imaging scanners of the same vendor, and our study sample being enriched with known aneurysms limit the possibility of evaluating the use of the algorithm in the setting of everyday clinical practice. However, it can serve as an external validation of the software solution because our data set was not acquired at the same institution as the data sets used for training, validation, and testing. Furthermore, our images were acquired on 3 different scanners by 2 different vendors, at field strengths of 3T and 1.5T, making them a heterogeneous study sample, representing the variety of examinations seen in our daily clinical routine. However, additional studies may be necessary to investigate how the software performs on images obtained on MR imaging scanners of different vendors.
Second, the number of cases is too small to draw final conclusions on differences depending on aneurysm locations; thus, our findings are of rather descriptive nature regarding location. Third, the software was not designed to replace the radiologist but to support radiologists in detecting aneurysms; our study investigated the diagnostic performance of the software alone against an experienced human reader. Sohn et al17 reported the improved diagnostic performance of a neurologist, a neurosurgeon, and a radiologist for the detection of cerebral aneurysms by TOF-MRA when supported by an AI software solution compared with their diagnostic performance without the support of the software. While we suspect that mdbrain can have a similar effect on the performance of readers, this was not systematically assessed in our study, and whether the software can assist radiologists in their daily work will be a matter of further investigation.
CONCLUSIONS
In our study, we assessed the potential of a commercially available and CE-marked software solution to automatically detect intracranial aneurysms on TOF-MRI data. Thus, our findings are important to radiologists using the software, to understand its capabilities but also its limitations. We demonstrated that the software has the potential to increase the detection rates for intracranial aneurysms while showing an acceptable rate of false-positive findings. There is need for further investigation to learn whether the software can assist radiologists in their daily routine to improve detection rates, interrater reliability, and reading times.
Footnotes
Mediaire GmbH provided technical details on the algorithm, including the training and validation process. The company had no influence on the conceptualization of the study or data acquisition or data interpretation.
Disclosure forms provided by the authors are available with the full text and PDF of this article at www.ajnr.org.
References
- Received May 3, 2022.
- Accepted after revision October 5, 2022.
- © 2022 by American Journal of Neuroradiology