Performance Analysis of Different Convolutional Neural Network (CNN) Models with Optimizers in Detecting Tuberculosis (TB) from Various Chest X-ray Images

— Tuberculosis (TB) is one of the top 10 infectious-disease-related deaths. This paper uses Convolutional Neural Networks (CNN) to investigate the accuracy and performance of three pre-trained models with different optimizers and loss functions to diagnose tuberculosis based on the patient's chest X-ray scans. Confusion matrix, precision, recall, f1-score value have also been recorded to understand and justify how accurately the model is predicting the disease from different angles.


A. Small Overview about Tuberculosis
Tuberculosis is an infectious, sometimes lethal disease caused by the microorganisms Mycobacterium tuberculosis (MTB) [1]. Tuberculosis is mostly a lung disease, although it can also affect other regions of the body [1]. When an infection does not cause symptoms, it is referred to as latent

C. Different Types of TB test 1) Mantoux tuberculin skin test (TST)
The tuberculin skin test is one of the few examinations from the nineteenth century that is still rigorously used as an essential test for TB diagnosis. Despite the fact that it is broadly utilized by physicians all around the world, its interpretation is really challenging and contradictory [7].
2) Blood test TB blood tests are sometimes known as interferon-gamma release assays, or IGRAs. Two TB blood tests approved by the US Food and Drug Administration (FDA) and available in the US are the QuantiFERON-TB Gold Plus (QFT-Plus) and the T-SPOT. T-Spot test for tuberculosis [8].

3) Imaging test
If the skin test is positive, the doctor will almost certainly order a chest X-ray or a CT scan. This could be white spots in the lungs where the immune system has fought TB infection, or it could represent abnormalities in the lungs produced by active tuberculosis [9].

4) Sputum test
If a chest X-ray reveals tuberculosis, a doctor may take sputum samples (the mucus that comes up when you cough).
Tuberculosis bacteria are tested in the samples. Drugresistant TB can also be detected in sputum samples. This helps the doctor choose the drugs that are most likely to work [9].

D. TB disease situation in the world
Notifications of tuberculosis cases have dropped dramatically: The most visible effect of the COVID-19 pandemic disruptions on tuberculosis is a massive global drop in the number of people newly diagnosed with tuberculosis and reported in 2020, compared to 2019. (Fig. 1). Following strong expansion from 2017 to 2019, the number of people in the United States fell by 18% between 2019 and 2020, from 7.1 million to 5.8 million [10].

E. Deep Learning in Healthcare
Deep Learning is gaining popularity in the medical industry. It demonstrates predictive tendencies capable of effectively analyzing complicated medical data. With artificial intelligence (AI) becoming integrated into many disciplines of medicine, it is critical for healthcare practitioners to grasp its potential and limits. It's expanding popularity and presence in healthcare has resulted in considerable media coverage, allowing a greater percentage of our society to become aware of its tremendous potential to assist. In health-care reform, obtaining information and meaningful insights from complex, high-dimensional, and heterogeneous biological data remains a fundamental challenge. Several types of data have evolved in current biomedical research, including electronic health records, imaging, sensor data, and text, all of which are complex, diverse, poorly annotated, and often unstructured. Feature engineering is often required in traditional data mining and statistical learning techniques to build meaningful and durable features from data, followed by the creation of prediction or clustering models on top of them. Both processes provide various challenges when dealing with complex data and a lack of subject expertise [11]. Healthcare's future has never been more promising. Not only can AI and ML enable the development of solutions that address very particular industrial demands, but deep learning in healthcare has the potential to be extremely powerful in aiding doctors and revolutionizing patient care [12].

F. Convolutional Neural Network
Convolutional Neural Networks (CNNs) are a type of deep neural network that is widely used (CNN). Convolution, a mathematical linear process between matrices, is where it all started. A convolutional layer, a non-linearity layer, a pooling layer, and a fully-connected layer are among the layers that make up CNN. Convolutional and fully-connected layers have parameters, whereas pooling and non-linearity layers do not. The CNN works admirably in machine learning problems [13].
II. RELATED WORK Mostofa Ahsan, Rahul Gomes, and Anee Denton investigated whether CNNs could be a good alternative to decision tree-based medical image categorization systems. They used CNNs on a dataset of chest X-rays (CXRs) to see if the patient had tuberculosis (TB). They employ the traditional decision tree method. Because CNNs have multiple hidden layers with filters, their model can attain a high level of accuracy of 80% without augmentation and 81.25 percent with augmentation. They employed a total of 1324 CXR images from the Shenzen datasets and 276 images from the Montgomery datasets [14]. Rahul Hooda, Sanjeev Sofat, Simranpreet Kaur, Ajay Mittal, and Fabrice Meriaudeau used a CNN architecture with seven convolutional layers and three fully linked layers to investigate the problem. Three distinct optimizers' performance has been compared. Adam optimizer fared the best among them, with an overall accuracy of 94.73 percent and validation accuracy of 82.09 percent. The Montgomery and Shenzhen datasets [15] were used to get all of the results. Chang Liu, Yu Cao, Marlon Alcantara, Benyuan Liu, Maria Brunette, Jesus Peinado, and Walter Curios investigated and proposed a new strategy for dealing with unbalanced, lesscategory X-ray images that uses CNN. Which has a high level of accuracy in diagnosing multiple TB symptoms. In a huge TB picture collection, they attained an accuracy of 85.68 percent [16]. Mustapha Oloko-Oba and Serestina Viriri used a Computer Aided Detection model based on Deep Convolutional Neural Networks to detect tuberculosis from Montgomery County (MC) radiographs. Their proposed model has a maximum validation accuracy of 87.1 percent [17]. Three separate proposals for the use of pre-trained CNNs in tuberculosis detection were provided by U.K. Lopes and J.F. Valiati. Three distinct CNN architectures are employed in the first proposal to extract features from a scaled radiography image. A SVM classifier is then trained using the retrieved features. The same three CNN architectures are utilized in the second proposal to extract features from CR sub-regions. After that, the retrieved features are merged to form a single global descriptor that is used to train an SVM. The best SVMs trained on Proposals 1 and 2 are utilized to generate ensembles of classifiers in the final proposal. Their models have an accuracy of 84.7 percent [18]. Michael Norval, Zenghui Wang, and Yanxia Sun investigated the accuracy of two techniques for detecting pulmonary tuberculosis using Convolutional Neural Networks based on patient chest X-ray images. Various picture preparation methods are compared to see which combination produces the best results. A hybrid strategy was also studied, combining the original statistical computeraided detection method with Neural Networks. A total of 406 normal and 394 aberrant photos were used in the simulations. Even greater results are obtained when the photos are further enhanced using the hybrid method. The hybrid technique yielded the maximum accuracy of 92.54 percent [19]. Pike Msonda, Sait Ali Uymaz, and Seda Sogukpinar Karaagac discussed the effects of Spatial Pyramid Pooling on automatic tuberculosis diagnosis using CXR. With and without SPP, three distinct CNN models (AlexNet, ResNet50, and GoogLeNet) were trained from scratch. SPP offers the capacity to obtain a more robust combination of characteristics, which increases accuracy, thanks to multilevel pooling. This study employed three separate datasets to create these CNN models. Two of them (Montgomery and Shenzhen) are publicly available datasets that were utilized to compare the success of the suggested SSP models to other approaches. The Konya Education Research Hospital provided the third dataset (KERH) (Turkey). In comparison to the outcomes of models on their two other public datasets, the training results of all the models on KERH's dataset performed better. AlexNet scores 0.94 without SPP and 0.95 with SPP, which is rather impressive. ResNet50 has a score of 0.93 without SPP and 0.94 with SPP, which is comparable to AlexNet. Untrained GoogLeNet and GoogLeNet-SPP yield the best results, with 0.97 and 0.98 validation accuracy, respectively [20]. Xudong Liu, Haoxiang Lei, and Sicun Han developed a method that allows a computer to extract features and recognize images of human lungs, as well as automatically determine the lungs' health status using a database. To train the datasets, they used a CNN model. Following the training, the system was able to perform some basic analysis. They also employed a fixed coordinate to reduce noise and paired the canny algorithm with the Mask algorithm to increase the system's accuracy even more. Finally, they achieved a maximum accuracy of 87.0 percent [21]. A model for identifying tuberculosis was proposed by Payal Gidwani, Urmi Gori, Aayush Dedhia, and Nasim Banu Shah. The overall accuracy with Adam Optimizer was 87 percent, with a validation loss of about 0.32 [22]. Thi Kieu Khanh Ho, Jeonghwan Gwak, Om Prakash, Jong-In Song, and Chang Min Park use the public ChestXray14 as a training dataset and Montgomery and Shenzhen as two external testing datasets to investigate the efficiency of deep convolutional neural networks (DCNNs) for detecting TB on chest radiographs. First, several preprocessing techniques, tSNE visualization, and data augmentation are carried out.
The X-ray pictures are then classified as having pulmonary TB symptoms or as healthy using three distinct pre-trained DCNNs, namely ResNet152, Inception-ResNet, and DenseNet121 models. They find that proper data augmentation approaches can further improve DCNN accuracies, resulting in the best classifier with an average accuracy of 95 percent for DenseNet121, 91 percent for Inception-ResNet, and 77 percent for ResNet121, respectively [23]. Decompose, Transfer, and Compose (DeTraC) is an unique CNN architecture based on class decomposition presented by Asmaa Abbas, Mohammed M. Abdelsamea, and Mohamed Medhat Gaber to improve the performance of medical image classification utilizing transfer learning and class decomposition technique. DeTraC allows for more separable learning at the subclass level, with the potential for faster convergence. They used three separate cohorts of chest X-ray images, histological images of human colorectal cancer, and digital mammograms to validate their suggested method. They compared DeTraC to current CNN models to show that it outperforms them in terms of accuracy, sensitivity, and specificity [24].

III. DATA INFORMATION
The dataset was taken from a famous online platform called Kaggle. The original owners are Tawsifur et al. [25]. The dataset contained 3500 Normal or Non-TB images and 700 TB patient images. Due to high mismatch between two classes, we have taken 914 normal images. Then 202 TB images have been augmented and added with 700 TB images to make it 902. This total of 1816 images with resolution 512 x 512 pixels have been divided into training and validation sets according to the Table I. IV. RESEARCH METHODOLOGY Fig. 2 shows a flow chart of our methodology. After collecting the image datasets, the images were resized to 224×224 pixels for faster training. Due to lack of TB X-rays, augmentation was done. Before starting training, batch size was taken as 10. Loss calculation was done using Binary Cross Entropy as this is a binary (Normal or TB) Classification. V. AUGMENTATION AND PRECAUTIONS 202 TB images have been created using augmentation. During this maximum rotation range was allowed to 10º. Parameters like width shift, height shift, shear and zoom ranges were given only 5% of the original image only. Fig. 3 shows some augmented images.

VI. TRANSFER LEARNING WITH PRE-TRAINED CNN MODEL
After augmentation, total 1446 images were taken for training in 3 pre trained models named InceptionV3, VGG16 and Xception. Normal chest X-ray were 724 in number and TB class had 722 images. 7 different types of optimizers were used. VGG16, which have 13 convolutional layers, 3×3 sized filters and 2×2 max pooling [26] performed the best among the three ImageNet networks. Other models fluctuated in their validation sets. This happens due scarcity of data in training set.

VII. EXPERIMENTAL RESULT
In first session three pre trained CNN models named InceptionV3, VGG16 and Xception were used along with a combination of 7 optimizers named Adam, Adagrad, Adadelta, Adamax, Nadam, RMSprop and SGD. So, a total of 21 models were trained and checked on 370 test images. By changing other parameter like loss function, batch size have also been trained & checked.
Finally, accuracy was measured from confusion matrices and accuracy, precision, recall and F1 scores were observed.

A. Loss and Accuracy Curves
The loss curves of validation set of VGG16 with Adagrad and Adamax optimizers and loss function as Binary Cross Entropy has maintained a good similarity with that of training set which can be seen in Fig. 5 and 6 which means the models are not over fitting. After 20 epochs (in X axis) the training and validation losses for Adagrad were 0.1119 and 0.1011, respectively and for Adamax were 0.0234 and 0.0258, respectively.  The accuracy curves for both optimizers with loss function are also given in Fig. 7 and 8. These models have shown excellent result is detecting from validation sets which we will be discussing sooner. After 20 epochs (in X axis) the training and validation accuracy for Adagrad were 0.9682 and 0.9865, respectively and for Adamax were 0.9945 and 0.9865, respectively.  Others model as InceptionV3 and Xception showed fluctuation in validation. The cause should be inadequate data in training. Augmentation did not help that much in their cases. Some loss curves for 20 epochs (in X axis) are given in Fig. 9, 10, 11.  Adadelta optimizer had a big gap between training and validation set which means this model is not learning that well with respect to other optimizers which is demonstrated in Fig. 12 and 13. Training and validation accuracy in InceptionV3 were found 92.32% and 81.35% respectively and for Xception they were 94.05% and 79.19% respectively.

B. Confusion Matrix
As expected VGG16 did very good classification, in RMSprop and Nadam with loss function Binary Cross Entropy and Squared Hinge it had only 1 misclassified TB images. Confusion matrix of those two are given in Fig. 14  and 15.  The confusion matrix for Adam and Nadam optimizer with Loss function as Binary Cross Entropy is also perform well. Those are given in Fig. 16 and 17 respectively.
. Fig. 16. Confusion matrices of VGG16 with Adam and Loss function as Binary Cross Entropy. But Adadelta optimizer did not perform well like the others in case of detecting TB images. Confusion matrix using Adadelta optimizer using all three CNN networks are given in Fig. 18-20.   Full performance summary of all parameter combination is presented in Table II. Top 3 best and worst case for validation loss, validation accuracy, precision, recall, f-1 score also been tabulated in Table III. VIII. DISCUSSION VGG16 models are executing well in this dataset. That's why its classification report is showing higher results, showing maximum recall rates for normal 100% and for recall rates for TB is 99%. Showing maximum precision for Normal is 99% and for TB it shows 100%. F-1 scores are also much higher (maximum 100% for normal as well as TB) than any other model like InceptionV3 and Xception. The performance of this paper with Abbas et al. [24] is shown in Table IV.
Data used comparison with Abbas et al. [24] details is shown in Table V.

IX. CONCLUSION & FUTURE WORK
This paper is a preliminary guideline for new biomedical researchers interested in Machine Learning or Deep Learning. It will give an idea about how pre trained networks can be used to classify medical images, how to observe loss and accuracy curves and create classification reports. We propose a hyper tuned networks for those which could not perform well in this dataset as future works. Also, the image segmentation can be done only for lung field for better outcome using networks like U-net.   ACKNOWLEDGMENT First and foremost, praise and appreciation to Almighty Allah for showering his blessings on us throughout our research work, allowing us to successfully conclude the research. Special thanks go to Google for helping us free of cost by creating a virtual environment like Google Colab for running, executing, testing, and validating our code smoothly and very fast throughout our whole thesis work. It really saved a lot of time which we can't express. We are extending our thanks to the persons who make datasets of x-ray and normal images from different hospitals and diagnostic centers by working hard and making our work very easy. Finally, we want to express our gratitude to everyone who has helped us accomplish the research work, whether directly or indirectly.