• 2019-10
  • 2019-11
  • 2020-03
  • 2020-07
  • 2020-08
  • 2021-03
  • br Data analysis of population statistics


    Data analysis of population statistics and data mining tech-niques were used in [24] to determining the cancer morbidity and mortality data in a regional cancer registry. However, false positive rate was not minimized. Multiple aspects of large scale knowledge mining was covered in [25] for medical and diseases examination. A new image-based features selection method was planned in [26] to categorize the lung computed tomography images with a higher accuracy. But, the feature selection rate was not improved.
    Table 1 presents a comparison of the proposed approach with state-of-the-art approaches. The main aim of this paper is to design diagnosis for LCD using ensemble classification algorithm with an objective to reduce the classification time and false positive rate as compared to the state-of-the-art approaches.
    3. Materials and methods
    In this paper, we proposed a WONN-MLB method to increase the performance of LCD diagnosis. The WONN-MLB is designed with an implementation of Newton–Raphsons MLMR prepro-cessing model and Boosted Weighted Optimized Neural Net-work Ensemble Classification algorithm. To validate the proposed WONN-MLB method, the Thoracic Surgery Data Dataset Wroclaw Thoracic Surgery Centre is used [26]. The patient data contains underwent major lung resections for primary lung cancer in 
    Fig. 1. Architecture of proposed approach for LCD diagnosis.
    the years 2007–2011. The center is linked through the Tho-racic Surgery of the medical university of Wroclaw and Lower-Silesian Centre for Pulmonary Diseases, Poland. In order to con-duct the experiments, a different number of patient data is taken, i.e., 10,000 patient data from Thoracic Surgery data dataset. In this data set, the information related to forced vital capacity, pain before surgery, Haemoptysis before surgery, Dyspnoea before surgery, cough before surgery, weakness before surgery, periph-eral arterial diseases , smoking , asthma, age at surgery, and year survival 516-35-8 were collected. Based on this information, the LCD classification was made in the proposed approach.
    3.1. Proposed approach
    This section describes the proposed approach and the pro-posed architecture with WONN-MLB method for LCD, as shown in Fig. 1. The different phases to implement and utilize the proposed approach are shown in Fig. 1. These include the data acqui-sition (Thoracic Surgery Data Dataset) Zięba et al. [2], feature selection or preprocessing (reducing big data feature dimension-ality), and ensemble classification (using WONN-MLB) and are comprehensively discussed in the next subsections.
    The data is obtained for classification problem related to lung cancer patients from the Thoracic Surgery Domain (TSD) archive in the Department of Thoracic Surgery of the Medical University of Wroclaw and Lower-Silesian Centre for Pulmonary Diseases, Poland, from UCI Machine Repository. The data was collected retrospectively at Wroclaw Thoracic Surgery Centre for 1200 patients who underwent major lung resections for primary lung cancer in the years 2007–2011. We have used these predictors for lung cancer prediction from the online UCI repository acquired from Zięba et al. [2].
    3.1.2. Newton–Raphsons Maximum Likelihood and Minimum Re-dundancy preprocessing
    To overcome the time complexity, accuracy problems in big data classification, initially preprocessing step is needed to ex-tract the relevant attributes. While extracting the relevant at-tributes the redundant attribute removal is unable to be per-formed in conventional techniques. This produces the misclas-sification results in LCD diagnosis. Therefore, Newton–Raphsons
    Fig. 2. Flow of MLMR preprocessing model.
    Maximum Likelihood and Minimum Redundancy pre-processing techniques is developed to perform relevant attribute extraction through removing redundancy.
    A large-scale ML classifier based on boosted classifiers [2] was used for the classification of biomedical lung cancer data. Moreover, an iterative process was carried out by updating the boosting coefficient value to minimize the weighted error func-tion. Despite to minimize the weighted error, less focus was made on the time consumed for lung cancer diagnosis. In notochord work, an integrated Newton–Raphsons MLMR preprocessing model is applied to the data acquired from the Thoracic Surgery Data Dataset Zięba et al. [2] with an objective not only to reduce the weighted error, but also to minimize the time consumed. The preprocessing proposed model is based on the Newton–Raphson’s method with the maximum likelihood, to obtain more robust results than other well-known algorithms such as SVMs Zięba  marginal probabilities (i.e., with a pair of attributes) ‘prob (x)’ and ‘prob (y)’,(where ‘x ∈ Attrandy ∈ Attr’) and the joint probability ‘prob (x, y)’ as given in Eq. (1).