br True negative rate or specificity TNR
• True negative rate or specificity (TNR) is the ratio of the number of
Fig. 4. ROC curve for cancer and non-cancer gene.
Fig. 5. Nyquist plots for cancer and non-cancer genes. Non-cancer SP 600125 exhibit larger Nyquist curve as it is larger in size and cancer genes have smaller curve for its smaller size. Gene 685 (2019) 62–69
correctly classified genes from the negative class (TN) i.e. non-cancer to the number of all genes from the negative class (TN + FP).
• Precision or selectivity is the number of genes correctly classified as positive (TP) or negative (TN) divided by the number of all genes classified as positive (TP + FP) or negative (TN + FN), for positive and negative class, respectively.
• Matthews correlation coeﬃcient (MCC) is used as balanced mea-surement metric for two binary classification level and indicates correlation between the observed and predicted classification (Matthews, 1975), which is calculated as:
where TP, TN, FP and FN are the numbers of true positive, true nega-tive, false positive and false negative values respectively.
The genes are classified into cancerous and no-cancerous category by observing sensor phase response based on their hydrophilic and hydrophobic characteristics respectively. The negative phase indicates hydrophilic characteristics, whereas positive phase indicates hydro-phobic characteristics. The hydrophobicity and hydrophilicity features of gene primary structure are used to design two levels binary classifier for prediction of gene and phase responses are used as primary classifier to divide gene into two class i.e. cancerous and non-cancerous (Fig. 3). The overall accuracy of sensor network for frequency range 1 Hz to
10 MHz is 89.55%, with 87.06% true positive rate and 95.42% true negative rate. The MCC value achieved in the present work is 0.784. The performance characterization of the sensor at diﬀerent frequencies is given in Table 5, and demonstrated by the receiver operating char-acteristic (ROC) curves (Fig. 4). The nature of the curve determines the relation between sensitivity i.e. TPrate (True Positive rate) and in-directly specificity i.e. FPrate (False Positive rate) of the sensor. The reference line on the graph signifies random classifier. The performance of sensor can be quantified by calculating the area under the ROC curve (AUROC), which is 0.912. The ideal value of AUROC is 1, whereas a random guess value is 0.5. The AUROC below 0.5 indicates wrong prediction (Hanley and McNeil, 1982; Bewick et al., 2004).
The results in Table 5 reveal that maximum numbers of genes are correctly identified as cancer or non-cancer within the frequency range of 50 kHz to 1 MHz, which is the optimal frequency range for all gene classification. Therefore, the sensor design concepts truly identify the gene features within the optimal frequency range.
3.3. Nyquist analysis of sensor network
The electrical sensor is used here for classifying genes based on size and primary structure. The dynamic electrical responses of the sensors are investigated through the Nyquist plot representation and, the cor-respondences between the gene size and shape of Nyquist curve are established.
Fig. 5 shows a comparison between the Nyquist plots corresponding to the sensor transfer function model. The plots in the figure exhibit the positive half imaginary part versus the positive half real part of the transfer function. The solid curve of Nyquist plot is represented for non-cancer or hydrophobic gene while the dashed curve for cancer or hy-drophilic gene. The shape of the Nyquist plot for hydrophilic and hy-drophobic gene is diﬀerent. The non-cancer genes exhibit larger Ny-quist curve whereas the cancer genes exhibit smaller curve which is analogous to hydrophobic genes are larger in size while hydrophilic ones are smaller in size. For example, the top subplot in Fig. 5 shows larger curve for NUP214 (hydrophobic) gene of 450 amino acids long and smaller curve for LOC107815086 (hydrophilic) gene of 98 amino acids long. The similar characteristic is observed for other genes also. Thus, the shape of the Nyquist plot correlates the size of the gene as hydrophobic genes contain larger number of amino acids than the hy-drophilic genes. Furthermore, the Nyquist curve (Fig. 6) can be used as
Fig. 6. Gene hydrophilicity range by average Nyquist plot. Very hydrophilic genes have smallest Nyquist curve and slightly hydrophilic genes have largest curve according to their size.
second level classifier to score the hydrophilicity of genes. Fig. 6 shows average Nyquist plots corresponding to the gene hydrophilicity range i.e. very hydrophilic, medium hydrophilic and slightly hydrophilic (Table 1). The Nyquist curve for ‘slightly hydrophilic’ gene is largest among all hydrophilic genes whereas the smallest curve represents the ‘very hydrophilic’ gene, which correlates that the gene hydrophilicity depends on gene length also. Therefore, the Nyquist plot is a convenient tool to detect the gene features: size and length of primary structure (Golub et al., 1999; Long et al., 2011; Parry et al., 2015).