Background
Radiologist fatigue is a significant contributor to diagnostic errors, which has serious consequences on patient outcomes. Both eye gaze and speech patterns have been studied to identify possible indicators of fatigue outside the field of radiology. SwRI utilized a dataset of the speech, gaze, and fatigue survey data collected from 23 radiologists as they read chest radiographs containing potential nodules to build predictive machine learning models of fatigue.
Approach
Radiologist performance was determined by intersection over union of their annotations compared to ground truth. The Swedish Occupational Fatigue Inventory measured self-reported fatigue metrics on five dimensions: lack of energy, physical exertion, physical discomfort, lack of motivation, and sleepiness. We built custom tools to extract salient gaze features (i.e., saccades, fixations, and pupillometry parameters), and speech features (i.e., frequency, energy, and spectral parameters). These feature sets were used independently to compare the mean subject-fold cross validation performances of a 12-layer feed-forward neural network, support vector classifier, linear regression, and DeepConvLSTM. Combinations of speech, gaze, and fatigue features were used to predict radiologist performance and fatigue to determine the best model and feature set.
Accomplishments
We analyzed radiologist performance on the pulmonary nodule detection task to identify diagnostic errors and extract gaze and speech features. Statistical analysis of fatigue surveys revealed that physical discomfort was significantly correlated (Recall r=0.31, p>0.5 and F1 r=0.31, p<0.5) with task performance. Across the models individually trained on gaze and speech features, the 12-layer feed-forward neural network performed best. When the 12-layer feed forward neural network was trained to predict performance and fatigue on combinations of these feature sets (gaze, speech, gaze-speech, gaze-fatigue, speech-fatigue, gaze-speech-fatigue), the highest performing model was the combined gaze and fatigue (gaze-fatigue) features model. The model that predicted performance had a mean accuracy of 0.74, a standard deviation of 0.10, and a maximum accuracy of 0.89 across all the subject folds. Additionally, the model that predicted fatigue (excluding physical discomfort) had a mean accuracy of 0.98 and a standard deviation of 0.04.