BACKGROUND AND AIM
One of the main issues in radiomics is the heterogeneity of data and methods of analysis because each step of a radiomic study hides pitfalls that, in the end, can combine and lead to the failure of the whole process.1-3 Several works can be found in the literature concerning single sources of variability, but they do not provide conclusions on the whole workflow. 4-6 The aim of this work was to analyse the main sources of textural radiomic features (RF) variability in the different steps of the radiomic workflow,1 to quantify its extent, and evaluate possible recommendations for its reduction.
MATERIALS AND METHOD
For each step of the radiomic workflow, potential sources of variability concerning CT imaging were analysed (Figure 1).
The authors focussed on intrascanner repeatability, interscanner reproducibility, tube voltage, and automated workload in the acquisition step; slice thickness, interval, algorithm, and kernel for the reconstruction step; inter-reader and interformat variability in segmentation; and voxel resampling and parameters for features extraction.
The analyses were performed on Catphan® (The Phantom Laboratory, Salem, New York, USA) acquisitions and patients’ images. A wide set of scanner manufacturers and models was considered involving different centres. The software used for segmentation and features extraction were IntelliSpace Portal 8.0 (Philips Medical Systems, Amsterdam, the Netherlands), 3DSlicer,7 and IBEX.8
The effect of the different sources of variability on RF was expressed in terms of relative standard deviation (RSD) or relative discrepancy.
RESULTS
In the imaging acquisition step, intrascanner repeatability, interscanner reproducibility, and tube voltage caused high RF variability, with RSD ranging from 0% to 800% and a mean value of 30%. On the other hand, the automated workload was demonstrated to not strongly affect RF values.
Regarding imaging reconstruction, the most crucial parameters were algorithm and kernel
with RF variations, in terms of relative discrepancy and RSD, up to 600% and 400% and mean
values of 50% and 20%, respectively.
The inter-reader variation in contouring was overall the largest source of variability, with mean and maximum values of 60% and 1,000%, respectively.
In the RF extraction step, the interslice resampling appeared not a useful solution, while the choice of the feature category parameters was the most critical point to standardise in the radiomic workflow. In addition, changes in these values unpredictably affect the variabilities caused by the other parameters.
Overall, seven textural RF (out of 32) showed a variability within the 10% for all the analysed issues, and 19 RF did not exceed the 20% either way.
CONCLUSION
A phantom study is preparatory to determine an optimal workflow that maximises RF predictivity on patients. Some variability sources can be limited or removed through standardisation processes, especially for the imaging reconstruction and RF extraction steps. Nevertheless, the issues due to acquisition and inter-reader variability remain. Regarding the acquisition step, the main effect on RF stability is attributable to reproducibility, which at present is unavoidable in multicentric studies. On the other hand, inter-reader variability might be limited through automatic segmentation tools. Finally, variability sources and the different behaviour of RF must be evaluated depending on the trial characteristics in order to find a compromise between stability and predictivity.