Abstract: |
The standard diagnostic procedure for targeted therapies in lung cancer treatment involve cancer detection, histological subtyping, and subsequent detection of key driver mutations, such as epidermal growth factor receptor (EGFR). Even though molecular profiling can uncover the driver mutation, the process is expensive and time-consuming. Deep learning-based image analysis offers a more economical alternative for discovering driver mutations directly from whole slide images (WSIs) of tissue samples stained using hematoxylin and eosin (H&E). In this work, we used customized deep learning pipelines with weak supervision to identify the morphological correlates of EGFR mutation from hematoxylin and eosin-stained WSIs, in addition to detecting tumor and histologically subtyping it. We demonstrate the effectiveness of our pipeline by conducting rigorous experiments and ablation studies on two lung cancer datasets – the cancer genome atlas (TCGA) and a private dataset from India. With our pipeline, we achieved an average area under the curve (AUC) of 0.964 for tumor detection and 0.942 for histological subtyping between adenocarcinoma and squamous cell carcinoma on the TCGA dataset. For EGFR detection, we achieved an average AUC of 0.864 on the TCGA dataset and 0.783 on the dataset from India. Our key findings are the following. Firstly, there is no particular advantage of using feature extractor layers trained on histology if there are differences in magnification. Secondly, selecting patches with high cellularity, presumably capturing tumor regions, is not always helpful, as the sign of a disease class may be present in the tumor-adjacent stroma. And finally, color normalization is still an alternative worth trying when compared to color jitter, even though their origins lie in opposing approaches to dealing with stain color variation. |