GENERATIVE AI is expected to play a key role in transforming radiology workflows, especially for routine tasks such as chest radiograph interpretation. A recent study investigated the diagnostic accuracy and clinical value of a domain-specific multimodal generative AI model for generating preliminary chest radiograph reports.
The AI model was trained on over 8 million radiograph-report pairs and tested on a set of radiographs from public datasets. The results of the AI-generated reports were compared with radiologist annotations, focusing on the detection of 13 clinical findings and overall report quality. Findings indicated that the AI model exhibited high sensitivity, particularly for detecting critical conditions such as pneumothorax and subcutaneous emphysema, with sensitivities of 95.3% and 92.6%, respectively. The overall sensitivity of the model across all clinical findings was 83.2% (1,821 of 2,190), demonstrating strong detection capabilities.
In terms of report quality, the acceptance rate of reports generated by the domain-specific AI model was 70.5% (6,047 of 8,680), compared to 73.3% (6,288 of 8,580) for radiologist reports and 29.6% (2,536 of 8,580) for reports generated by GPT-4Vision. The model’s reports also scored higher on agreement and quality metrics. Agreement scores for the model-generated reports had a median of 4 (IQR 3–5), compared to 3 (IQR 2–5) for radiologists’ reports and 1 (IQR 1–3) for GPT-4Vision reports (P<0.001). Quality scores were similarly higher for the model-generated reports, with a median of 4 (IQR 3–5), compared to 4 (IQR 2–5) for radiologists’ reports and 2 (IQR 1–3) for GPT-4Vision reports (P<0.001). Furthermore, a comparative ranking analysis showed that the AI model-generated reports were most frequently ranked first (60.0%) by four readers (radiologists who were not involved in other parts of the study), while radiologists’ reports were most commonly ranked second (54.7%) GPT-4Vision reports were ranked lowest most often (73.6%).
In conclusion, this study highlights the clinical value of a domain-specific generative AI model in producing preliminary chest radiograph reports with high diagnostic accuracy and quality. While the model shows considerable promise, further development is needed to address limitations such as false positives and negatives, and to improve the model’s integration into clinical practice.
Reference
Hong EK et al. Diagnostic Accuracy and Clinical Value of a Domain-specific Multimodal Generative AI Model for Chest Radiograph Report Generation. Radiology. 2025;314(3);DOI: 10.1148/radiol.241476.