Artificial classification of cervical squamous lesions in ThinPrep cytologic tests using a deep convolutional neural network
Affiliations: Department of Digital Medicine, School of Biomedical Engineering and Medical Imaging, Third Military Medical University (Army Medical University), Chongqing 400038, P.R. China, Department of Pathology, Third Affiliated Hospital, Third Military Medical University (Army Medical University), Chongqing 400042, P.R. China, Department of Medical Engineering, Second Affiliated Hospital, Third Military Medical University (Army Medical University), Chongqing 400037, P.R. China
- Published online on: August 12, 2020 https://doi.org/10.3892/ol.2020.11974
- Article Number: 113
Copyright: © Liu et al. This is an open access article distributed under the terms of Creative Commons Attribution License.
This article is mentioned in:
Cervical cancer is one of the most common malignant tumors in women worldwide and in China (1), 80% of which is made up of the cervical squamous cell carcinoma (2). In 2018, there were 106,430 new cases and 47,739 deaths due to cervical cancer in China (3). The average morbidity and mortality are on the increase with annual rates of 8.7% and 8.1%, respectively (4). However, precursor lesions with regards to the diagnosis of cervical cancer could reduce its incidence and improve the patient's quality of life.
Effective early screening technology that detects precursor lesions includes the Pap smear and the ThinPrep Cytologic Test (TCT). Previous reports showed that the TCT has higher sensitivity (5), and is one of the most common cervical cancer screening methods (6). The diagnosis from TCT samples is mainly dependent on a pathologist's interpretation under a microscope, according to the Bethesda report system (7). This system defines six categories of cervical squamous cells: Negative for intraepithelial lesion or malignancy (NILM), atypical squamous cells-unclear meaning (ASC-US), atypical squamous cells-do not rule out high-grade squamous intraepithelial lesions (ASC-H), low-grade squamous intraepithelial lesions (LSIL), high-grade squamous intraepithelial lesions (HSIL), and squamous cell carcinoma (SCC). The first category, NILM, indicates the presence of normal cells, while the remaining five categories indicate the presence of abnormal cells. The degree of morphological abnormality is the key to defining the lesion grade, including the shape, size, and boundary of the nucleus and the characteristics of the chromatin. However, accurate diagnosis of cervical squamous cells by pathologists is time-consuming and subjective, with the classification of samples influenced by cellular complexity and the pathologist's previous experience interpreting TCT samples.
Advances in artificial intelligence (AI) technology, especially the deep convolution neural networks (DCNN), has obtained great success in medical research, including skin lesion classification (8) and acute lymphoblastic leukemia diagnosis (9). The DCNN often contains several layers with convolutional kernels, similar to a mathematical model that contains numerous parameters; thus, a large amount of data is needed to train kernels. Unlike traditional algorithms (10), DCNN is not limited to features that are extracted based on the experience of expertise. Instead, it uses the original images as input, automatically extracts a large number of hierarchical features, and finally generates a diagnosis as an output. Many researchers have focused on digital cervical pathology, where DCNN is used to provide a diagnosis of cervical cancer (11–16). Abdulkerim et al (11) and Chankong et al (12) segmented the cervical cells for diagnosis. Zhang et al (13) and Wu et al (14) classified the cervical lesion grades in single cell images. Tian et al (15) and Liu et al (16) simultaneously detected a lesion location and classified the lesion degree in whole pathological images without prior cell segmentation or cropping.
Although previous studies have advanced the use of DCNN in cervical cancer screening, the studies face three major limitations. First, those studies usually focused on PAP-smear images (11–14). This early-screening method has been largely replaced by TCT-staining, but researchers have not yet examined the use of DCNN to diagnose cervical cancer from TCT images. Second, to improve and evaluate the comprehensiveness and robustness of DCNN models, multi-center datasets, which includes data from different levels of hospitals, are needed. However, previous studies have only included data from one center (14–15). Third, in order to analyze the results of the AI diagnoses, researchers should compare the results with those from the manual interpretation completed by pathologists. However, few studies have generated such comparisons (17).
The aim of the present study was to address the limitations of previous studies that have examined the utility of AI in cervical cancer detection by building a TCT dataset from samples collected from different hospitals and creating an automated pipeline for the normal and abnormal cervical squamous cell classification using VGG16, which is a popular DCNN model. The results were compared with the classifications provided by two different pathologists.
Materials and methods
TCT image collection
TCT specimens were obtained from a total of 82 patients in four hospitals located in the city of Chongqing, China (Table I).
The inclusion criteria in this study were: i) Patients aged ≥18 years, ii) satisfactory TCT specimens that contained ≥2,000 squamous epithelial cells with clear morphologies, and iii) the diagnostic results of the TCT had to be one of the precursor lesions subtypes or cervical cancer. This study was approved by the Ethics Committee of the Army Military Medical University (approval no. KY201774) and was conducted in accordance with the Helsinki Declaration. The study was explained to all the patients, and oral informed consent was obtained from them for their sample to be used for scientific research by phone when images were collected. Written patient consent was not required, according to the guidance of the ethics committee.
A total of 1736 abnormal cells and 1554 normal cell samples were obtained, which were randomly divided into training (n=2669) and test (n=621) datasets at a ratio of 4:1. All images were in the BMP file format.
TCT specimens were digitized to images using a microscope with a 40× objective lens (model: VS120, Brand: Olympus) and cut into 2048×2048 pixels for labeling (Fig. 1). Two clinical pathologists independently labelled the squamous epithelial cells with bounding boxes as NILM, ASC-US, LSIL, ASC-H, HSIL, and SCC using LabelImg software, which is a graphical image annotation tool (https://github.com/tzutalin/labelImg), according to the Bethesda reporting system. After labeling, all data were split into single-cell images according to the labeled coordinates of bounding boxes and rescaled to 224×224 pixels according to the shortest side (Fig. 1). To reduce the inter- and intra-observer variability, an annotated review was conducted of the original pixel images of the single cells one month following the initial labeling.
TCT image preprocessing procedures. (A) Large-scale digital pathological image. (B) Labeling the cells in the 2048×2048 images cropped from (A). (C) Single cells split from the 2048×2048 images. (D) Rescaled single-cell images (224×224).
In order to prevent the problem of over-fitting, where the model may not accurately predict additional data, data augmentation on the training set using image flipping was performed. Data normalization was completed to ease the redundant image differences caused by the different environments and staining workflow of multiple hospitals (18).
Development of the DCNN model
The development of the DCNN model followed a protocol of training and then testing (Fig. 2). The results of the model were compared with that independently obtained by two experienced pathologists. An Intel Core I7-7800X 3.50GHz * 12 with four Titan X GPUs using Python 3.5 (https://www.python.org) and the TensorFlow (https://www.tensorflow.org) library for all experiments were used. The number of training iterations was set to 2,000, and the initial learning rate was set to 0.01. The optimizer was selected via the Batch Gradient Descent.
Workflow of this study. DCNN, deep convolutional neural network; ETS, ensemble training strategy.
Through previous literature and preliminary experiments, the VGG16 model demonstrated a balance in framework, accuracy, computational efficiency and proven performance in the medical field, and was chosen for our experiments (19–21). The model comprises 16 layers, including 13 convolutional layers with 3×3 convolution kernels and 3 fully connected layers. At the same time, there are also five max-pooling layers for feature dimensionality reduction and two drop-out layers to prevent over-fitting. First, the original single-cell images were input into the network and represented by a matrix. Then, the convolutional layer and the max-pooling layer extracted the deep features and mapped them onto a hidden layer. The fully connected layers distributed the features onto the sample tag space. Finally, the last fully connected layer with a softmax classifier was used to predict the classification probability (Fig. 3). In our experiment, the activation functions were measured as Rectified Linear Units (ReLU).
Structure of the DCNN model. The DCNN model includes 13 convolutional layers with 3×3 kernel (the black squares), and three fully connected layers (the black rectangles). ReLU, rectified linear units; conv, convolutional layer; Fc, fully connected layer.
An ensemble training strategy (ETS) was proposed based on 5-fold cross-validation and was employed in our training stage. In our strategy, the training set was randomly divided into five equal parts, 4/5 for training and the remaining 1/5 for validation. This division was repeated five times, and the previous model was used as a pre-training model for the next iteration each time. The last repeated model parameter indicated the final result.
Transfer learning was used to improve the generalizability of the model (22). The dataset was used to fine-tune the DCNN model that was trained on the ImageNet dataset (23). To reduce computational complexity, the first seven layers of the DCNN network were frozen, and the remaining nine layers were trainable in the training stage.
We further improved the model generalizability and prevented over-fitting with early-stopping (24,25). The classical stochastic gradient descent optimizer was selected, and the learning rate changed adaptively with the introduction of a variable factor G, defined as:G:=(Lti-Lvi)2ddLVi
where, Lit indicates the loss value at the ith iteration in the training stage, and Liv stands for the loss value in the ith iteration in the validation stage. When G is greater than a threshold (set to 5 in our experiment), the learning rate is adjusted to 0.1 times the current learning rate.
The performance of the proposed DCNN model was evaluated by measuring its accuracy, precision, sensitivity, F1 values, and speed. Accuracy was defined as the proportion of the correct classified samples of the total samples, and precision was defined as the ratio of the positive sample that was correctly classified. Sensitivity was expressed as the ratio of negative samples, which were correctly classified as the total negative samples. F1 reflected the performance of the model and is a composite index that reflects both sensitivity and specificity. Speed was defined as the average time that one cell was identified, and it was calculated to evaluate the efficiency of the DCNN model. In addition, the independent component analysis (ICA), which is a popular method to represent multivariate data (26), was used to visualize the features of cells in the test dataset before and after classification by the DCNN model (27).
Two experienced pathologists (pathologist 1 and 2, 12-years and 3-years working experience, respectively) were invited to classify the squamous epithelial cells independently, and their performance was compared with that of the DCNN models.
The comparison of the classification accuracy between the DCNN models and the pathologists was conducted by the paired sample t-tests. The statistical analyses were performed using SPSS software (version 16.0). P<0.05 was considered to indicate a statistically significant difference.
DCNN models and classification accuracy
The accuracy, precision, sensitivity, and F1 of the DCNN model with ETS achieved slightly better performance in identifying lesion cells in the test dataset than those of the model without ETS and the pathologists (Table II). The training strategy improved the accuracy by 1.29%, the precision by 1.22%, the sensitivity by 1.12%, and the F1 value by 1.32%. Compared with Pathologist 1, the DCNN model with ETS had 0.80% lower accuracy, 1.13% lower precision, 0.80% lower sensitivity, and 0.86% lower F1 value. Compared with Pathologist 2, the DCNN model with ETS had 0.33% higher accuracy, 0.85% higher precision, 1.47% higher F1 value, but 0.5% lower sensitivity. When the accuracy was examined by lesion subtype, the differences between the accuracy of the models and that of each pathologist were not significant (Table II; DCNN without ETS vs. DCNN with ETS: P0=0.387>0.05, DCNN with ETS vs. Pathologist 1: P1=0.771>0.05, DCNN with ETS vs. Pathologist 2: P2=0.489>0.05). The accuracy rate of the DCNN model with ETS varied by lesion subtype, with ASC-H and HSIL having the highest accuracies (100%) and ASC-US having the lowest (75%; Table III).
DCNN models and classification efficiency
In the classification efficiency, the results showed that the DCNN model with ETS achieved the fastest speed (0.0732 sec; Table II), and it was 0.0036 sec faster compared with that without ETS, 0.3906 sec faster compared with Pathologist 1, and 0.4260 sec faster compared with Pathologist 2 (Table II). The classification speed of the DCNN models was almost six times faster than that of the pathologists.
We visualized the loss and accuracy values during the training stage of the DCNN models. The final loss values were both <0.1, and the accuracy values were >0.9 after several iterations (Fig. 4). However, the values for the DCNN with ETS were always lower and smoother than that without ETS, indicating that the DCNN model with the ETS had a more stable performance (Fig. 4).
Visualization of (A) loss and (B) accuracy values during the training stage. The red line shows the data for the DCNN model with ETS. The blue line showsthe data for the DCNN model without ETS.
DCNN model and ICA
Visualization of the test dataset using ICA revealed that the normal and abnormal cells were separated into two distinct classes after identification with the DCNN model with ETS (Fig. 5). Simultaneously, we randomly showed some cells which were classified correctly and put into 2048×2048 images (Fig. 6). It indicated that the DCNN model could classify the cells which had unclear nuclei, even in the cell cluster.
Visualization of the (A) origin data and (B) the extracted features by the DCNN model with ETS in the test dataset. The blue dots are the normal cells, and the green dots the abnormal cells.
Cells correctly classified by the DCNN model with ETS in the test dataset. (A-C) Corrected-classified cells in the 2048×2048 images, the second line shows the details of the cells above.
In the present study, we developed a DCNN model based on VGG16 for the automated classification of cervical squamous cells that accurately differentiated between normal cells and lesion cells in a multi-center TCT dataset. Compared to classifications independently provided by two experienced pathologists, the AI system achieved similar classification accuracy with improved efficiency.
Our datasets were composed of many TCT images that originated from four different hospitals that differed in their trauma levels and TCT machines. These sources of variation provided diversity within the dataset that was used to train and optimize the DCNN algorithm, thereby improving the generalizability of this model.
Relative to the large-scale dataset, like the millions of ImageNet datasets (23), required for the development of most DCNN models in the training stage, our dataset is small. Therefore, we employed ETS based on 5-fold cross-validation, which improved the accuracy and speed compared with the traditional DCNN model. At the same time, in terms of model performance, the loss value obtained by the DCNN model with ETS could be fitted better and faster than the model without ETS. Even though the inclusion of ETS did not significantly improve the accuracy, the incremental increase in accuracy may still be clinically important. Future investigations will continue to verify the classification effect of this method for future studies.
The classification accuracy of the DCNN model with ETS was 98.07% for our multi-center dataset. The ICA analysis found that the characteristics of normal squamous epithelial cells before classification were more concentrated, and the characteristics of abnormal cells were scattered and polymorphic. This observation may be mainly due to the presence of many subtypes in our test dataset, including ASC-US, LSIL, ASC-H, HSIL, and SCC. The characteristics of some abnormal cells are very similar to each other. Meanwhile, the features of normal and abnormal cervical squamous epithelial cells classified by the DCNN model were clearly distinguished, which indicates that the DCNN could accurately classify cervical squamous epithelial cells with fine-grained characteristics.
The DCNN model with ETS differed in classification accuracy among lesion subtypes. SCC is the most direct evidence for the diagnosis of cervical cancer (7). Compared with other precancerous cells, SCC contains significant nuclear atypia, yet classifications with our model only achieved 85.71% accuracy. This lower accuracy may be due to the significantly smaller number of SCC cells than the number of other classes in our training dataset (n=39), which is called class-imbalance problem. The low classification accuracy for ASC-US, 75%, may be due to the high similarity of features between ASC-US, NSIL, and LSIL. In fact, even experienced pathologists can experience difficulty distinguishing among these subtypes. Observation of the cells which were correctly classified demonstrated that even though the cells were isolated from cell clusters, were small, had difficulty in distinguishing deep nuclear staining, or had unclear nuclei, the model could still accurately classify these samples.
The DCNN model developed here provides a similar classification performance to that of the pathologists, but with a highly improved classification efficiency. This result indicates that our model could assist pathologists with the interpretation of TCT specimens and reduce their workload. In the future, this highly efficient automated method may be widely used in clinical settings to overcome the subjective and time-consuming nature of the manual diagnosis.
Researchers have previously proposed DCNN models to classify samples of cervical cells. Zhang et al (13), used a DCNN to classify normal and abnormal cervical cells and found accuracies of 98.3 and 98.6% for the two public datasets, Herlev dataset and HEMLBC dataset, respectively. However, the testing time for one cervical cell averaged 3.5 sec. Wu et al (14), used a DCNN to classify keratinizing squamous, non-keratinizing squamous, and basaloid squamous cells in a single-center dataset, and achieved an accuracy of 93.33% (speed was not reported). While the accuracies reported in these two studies were high, the authors did not externally validate the results by comparing classification with those provided by pathologists. Furthermore, the diagnosis efficiency reported by Zhang et al (13) did not satisfy the need of the clinic.
In summary, our study focused on developing an accurate and efficient automated method that could be used in clinics to assist pathologists in interpreting TCT early-screenings. There are four main limitations of the method employed in the present study: i) We used images of single cells that were extracted from whole slide images in advance. Future studies should focus on detecting the cervical squamous lesion cells in whole slide images without any prior segmentation or splits. ii) Our study only included a binary classification of cervical squamous cells. Future studies should focus on a multi-classification model for cervical squamous cells. iii) The small size of our dataset and the class-imbalance problem have a negative effect on the results, thus more data should be collected from clinics, especially on ASC-US, ASC-H, and SCC cells in the future. iv) Our data was collected from the southwestern region of China, and this geographically limited dataset may limit the generalizability of the resulting model. Future studies should focus on collecting data globally.
The present study was supported by the National Key Research and Development Plan (grant no. 2016YFC0106403), the National Natural Science Foundation of China (grant nos. 31771324 and 61701506), the Military Medical Research Program Youth Development Project (grant nos. 16QNP100 and 2018XLC3023) and the Graduate Education and Teaching Reform Project of Chongqing (grant no. yjg183144).
Availability of data and materials
The datasets analyzed during the present study are not publicly available as they are part of an ongoing study.
LL conceived the study, analyzed the data, performed the statistical analysis and drafted the initial manuscript. LL and JX created the DCNN model and analyzed the data. YW and QM collected the TCT dataset and performed pathological analysis. LT and YW were involved in project development and critically reviewed the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
The present study was approved by the Ethics Committee of the Army Military Medical University (Chongqing, China; approval no. KY201774). The study was explained to all the patients, and oral informed consent was obtained from them for their sample to be used for scientific research by phone when images were collected. Written patient consent was not required, according to the guidance of the ethics committee.
Patient consent for publication
The authors declare that they have no competing interests.
Small W Jr, Bacon MA, Bajaj A, Chuang LT, Fisher BJ, Harkenrider MM, Jhingran A, Kitchener HC, Mileshkin LR, Viswanathan AN and Gaffney DK: Cervical cancer: A global health crisis. Cancer. 123:2404–2412. 2017. View Article : Google Scholar : PubMed/NCBI
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA and Jemal A: Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 68:394–424. 2018. View Article : Google Scholar : PubMed/NCBI
Chacho MS, Mattie ME and Schwartz PE: Cytohistologic correlation rates between conventional Papanicolaou smears and ThinPrep cervical cytology: A comparison. Cancer. 99:135–140. 2003. View Article : Google Scholar : PubMed/NCBI
Behtash N and Mehrdad N: Cervical cancer: screening and prevention. Asian Pac J Cancer Prev. 7:683–686. 2006.PubMed/NCBI
Ritu N and Wilbur D: The Bethesda System for reporting cervical cytology: Definitions, criteria, and explanatory notes. Springer International Publishing; Berlin: 2015, PubMed/NCBI
Xie Y, Zhang J, Xia Y and Shen C: A mutual bootstrapping model for automated skin lesion segmentation and classification. IEEE Trans Med Imaging. 39:2482–2493. 2020. View Article : Google Scholar : PubMed/NCBI
Gehlot S, Gupta A and Gupta R: SDCT-AuxNet°: DCT augmented stain deconvolutional CNN with auxiliary classifier for cancer diagnosis. Med Image Anal. 61:1016612020. View Article : Google Scholar : PubMed/NCBI
Abdulkerim C, Abdülkadir A, Aslı Ü, Nurullah Ç, Gökhan B, Ilknur T, Aslı Ç, Behçet UT and Lütfiye DA: Segmentation of precursor lesions in cervical cancer using convolutional neural networks. 25th Signal Processing and Communications Applications Conference (SIU). 2017.
Chankong T, Theera-Umpon N and Auephanwiriyakul S: Automatic cervical cell segmentation and classification in Pap smears. Comput Methods Programs Biomed. 113:539–556. 2014. View Article : Google Scholar : PubMed/NCBI
Zhang L, Lu L, Nogues I, Summers RM, Liu S and Yao J: DeepPap: Deep convolutional networks for cervical cell classification. IEEE J Biomed Health Inform. 21:1633–1643. 2017. View Article : Google Scholar : PubMed/NCBI
Wu M, Yan C, Liu H, Liu Q and Yin Y: Automatic classification of cervical cancer from cytological images by using convolutional neural network. Biosci Rep. 38:BSR201817692018. View Article : Google Scholar : PubMed/NCBI
Miyagi Y, Takehara K, Nagayasu Y and Miyake T: Application of deep learning to the classification of uterine cervical squamous epithelial lesion from colposcopy images combined with HPV types. Oncol Lett. 19:1602–1610. 2020.PubMed/NCBI
Weiss S, Xu ZZ, Shyamal P, Amir A, Bittinger K, Gonzalez A, Lozupone C, Zaneveld JR, Vázquez-Baeza Y, Birmingham A, et al: Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome. 5:272017. View Article : Google Scholar : PubMed/NCBI
Sharma S and Mehra R: Conventional machine learning and deep learning approach for multi-classification of breast cancer histopathology images-a comparative insight. J Digit Imaging. 33:632–654. 2020. View Article : Google Scholar : PubMed/NCBI
Taqi AM, Awad A, Al-Azzo F and Milanova M: The impact of multi-optimizers and data augmentation on TensorFlow convolutional neural network performance. IEEE Conference on Multimedia Information Processing Retrieval. 140–145. 2018.