# br Correspondence between parameters was

Correspondence between parameters was assessed by generating all 3-D parametric maps and post-processing with a 1.5-voxel (i.e., approximately 1.13 mm) Gaussian spatial filter. Subsequently, we calculated the median, 10th percentile, 90th percentile, mean, skewness, kurtosis, vari-ance and entropy (van Sloun et al. 2017a) of the parametric distributions in each region. The linear correlation between every pair of parameters was assessed in terms of the linear Pearson correlation coefficient; any correlation with p value above 0.05 was not deemed to significantly reflect the rela-tion between parameters. Moreover, as our distributions were not necessarily Gaussian, we employed a Wilcoxon rank sum test to assess the level of significance of each fea-ture to distinguish benign (diseases) from PCa or sPCa.

Then, in order to obtain the single-parametric perfor-mance, receiver operating characteristic (ROC) curves were acquired to assess each parameterâ€™s power to discrim-inate benign and malignant regions or benign and signifi-cantly malignant regions (van Erkel and Pattynama 1998). The performance was quantified by the area under the ROC curve (ROC-AUC).

Multi-parametric performance

After a full set of CEUS/CUDI features was estab-lished, we exploited a support vector machine (SVM) and a Gaussian mixture model (GMM) as a discriminative and generative machine-learning strategy, respectively, to com-bine potential complementary information. An SVM essen-tially distinguished two Suramin hexasodium salt by computing a hyperplane in multi-parametric space, which maximized the margin between the training-set different-class instances that were closest to it (i.e., the support vectors) (Cortes and Vapnik 1995). New instances were subsequently classified depend-ing on their position relative to this hyperplane. Analo-gously, supervised GMM algorithms identified each classâ€™s multi-parametric distribution based on the training set and then classified each test instance as the class for which nondisjunction had the highest probability (McLachlan 1992).

ARTICLE IN PRESS

3-D CEUS for Prediction of Prostate Cancer R. R. WILDEBOER et al.
5

The machine-learning approaches were implemented in the Matlab (Version 2015b, MathWorks, Natick, MA, USA) environment and their feasibility was assessed by evaluating their performance in a leave-one-prostate-out cross-validation procedure to be least sensitive to training-set imbalances. To avoid overfitting, we selected the region-mean as the most representative feature for all parameters. Parameter selection was subsequently per-formed on each training set through a backward feature elimination ranking procedure (Guyon and Elisseeff 2003), again in a leave-one-prostate-out fashion. This way, the best-suited feature set was specifically determined based on the training set of each cross-validation fold specifically, without observing the samples of the test patient.

For the SVM, a radial-basis kernel was adopted. Fur-thermore, the misclassification penalty for negative and positive instances was weighted in the training phase to compensate for the training-set unbalance. Other hyper-parameters were tuned in each training fold. The GMM robustness was improved by adding 0.01 to the diagonal elements of the covariance matrices, and the benign- and malignant-class distributions were fitted with 2 and 1 Gaussian(s), respectively. Each parameter was normalized by the 95-percentile value over the entire set to make the training invariant to parameter scale while preventing out-liers from affecting the scaling. The multi-parametric per-formance was assessed by computing ROC curves based on the double-logit distance to the SVM-hyperplane (i.e., 1/[1+exp(2x)] with x as the distance) or the GMM-confidence scores as defined by Wildeboer et al. (2017). These performance measures were also used for the feature selection during training.