Configuration Files

Features Extraction

In MEDimage, all the subpackages and modules need a specific configuration to be used correctly, so they respectively rely on one single JSON configuration file. This file contains parameters for each step of the workflow (processing, extraction…). For example, IBSI tests require specific parameters for radiomics extraction for each test. You can check a full example of the file here: notebooks/ibsi/settings/.

This section will walk you through the details on how to set up and use the configuration file. It will be separated to four subdivision:

General analysis Parameters

n_batch

A numerical value that determines the number of batches to be used in parallel computations, set to 0 for serial computation.
type	int

e.g.

{
    "n_batch" : 8
}

roi_type_labels

A list of labels for the regions of interest (ROI) to use in the analysis. The labels must match the names of the corresponding CSV files. For example, if you have a csv file named `roiNames_GTV.csv`, then the `roi_type_labels` msut be `["GTV"]`.
type	List[str]

e.g.

{
    "roi_type_labels" : ["GTV"]
}

roi_types

A list of labels that describe the regions of interest, used to save the analysis results. The labels must accurately reflect the regions analyzed. For instance, if you conduct an analysis of a single ROI in a `"GTV"` area with two different ROIs (`"Mass"` and `"Edema"`), the label can be `["GTVMassOnly"]`. This name will be displayed in the JSON results file.
type	List[str]

e.g.

{
    "roi_types" : ["GTVMassOnly"]
}

Pre-checks Parameters

The pre radiomics checks configuration is a set of parameters used by the DataManager class. These parameters must be set in a nested dictionary as follows:

{
    "pre_radiomics_checks": {"All parameters go inside this dict"}
}

wildcards_dimensions

List of wild cards for voxel dimension checks (Read about wildcards here). Checks will be run for every wildcard in the list. For example `["Glioma.MRscan.npy", "STS.CTscan.npy"]`
type	List[str]

e.g.

{
    "pre_radiomics_checks" : {
        "wildcards_dimensions" : ["Glioma*.MRscan.npy", "STS*.CTscan.npy"],
        }
}

wildcards_window

List of wild cards for intensities window checks (Read about wildcards here). Checks will be run for every wildcard in the list. For example `["Glioma.MRscan.npy", "STS.CTscan.npy"]`
type	List[str]

e.g.

{
    "pre_radiomics_checks" : {
        "wildcards_window" : ["Glioma*.MRscan.npy", "STS*.CTscan.npy"],
        }
}

path_data

Path to your data (`MEDscan` class pickle objects)
type	str

e.g.

{
    "pre_radiomics_checks" : {
        "path_data" : "home/user/medimage/data/npy/sts",
        }
}

path_csv

Path to your dataset csv file (Read more about the CSV File)
type	str

e.g.

{
    "pre_radiomics_checks" : {
        "path_save_checks" : "home/user/medimage/checks",
        }
}

path_save_checks

Path where the pre-checks results will be saved
type	str

e.g.

{
    "pre_radiomics_checks" : {
        "path_csv" : "home/user/medimage/data/csv/roiNames_GTV.csv",
        }
}

Note

initializing the pre-radiomics checks settings is optional and can be done in the DataManager instance initialization step.

Processing Parameters

Each imaging modality should have its own params dict inside the JSON file and should be organized as follows:

{
    "imParamMR": {"Processing parameters for MR modality"},
    "imParamCT": {"Processing parameters for CT modality"},
    "imParamPET": {"Processing parameters for PET modality"}
}

box_string

Box of the ROI used in the workflow.
type	string
options
`full`	Use the full ROI
	type	string
`box`	Use the smallest box possible
	type	string
`box{n}`	For example `box10`, 10 voxels are added in all three dimensions the smallest bounding box. The number after ‘box’ defines the number of voxels to add.
	type	string
`{n}box`	For example `2box`, Will use double the size of the smallest box . The number before ‘box’ defines the multiplication in size.
	type	string

e.g.

{
    "imParamCT" : {
        "box_string" : "box7",
        }
    "imParamMR" : {
        "box_string" : "box",
        }
    "imParamPET" : {
        "box_string" : "2box",
        }
}

interp

Interpolation parameters.
type	dict
options
`scale_non_text`	size-3 list of the new voxel size
	type	List[float]
`scale_text`	Lists of size-3 of the new voxel size for texture features (features will be computed for each list)
	type	List[List[float]]
`vol_interp`	Volume interpolation method (“linear”, “spline” or “cubic”)
	type	string
`gl_round`	This option should be set only for CT scans, set it to 1 to round values to nearest integers (Must be a power of 10)
	type	float
`roi_interp`	ROI interpolation method (“nearest”, “linear” or “cubic”)
	type	string
`roi_pv`	Rounding value for ROI intensities. Must be between 0 and 1.
	type	float

e.g.

{
    "imParamMR" : {
        "interp" : {
            "scale_non_text" : [1, 1, 1],
            "scale_text" : [[1, 1, 1]],
            "vol_interp" : "linear",
            "gl_round" : [],
            "roi_interp" : "linear",
            "roi_pv" : 0.5
        }
    "imParamCT" : {
        "interp" : {
            "scale_non_text" : [2, 2, 3],
            "scale_text" : [[2, 2, 3]],
            "vol_interp" : "nearest",
            "gl_round" : 1,
            "roi_interp" : "nearest",
            "roi_pv" : 0.5
        }
    "imParamPET" : {
        "interp" : {
            "scale_non_text" : [3, 3, 3],
            "scale_text" : [[3, 3, 3]],
            "vol_interp" : "spline",
            "gl_round" : [],
            "roi_interp" : "spline",
            "roi_pv" : 0.5
        }
    }
}

reSeg

Resegmentation parameters.
type	dict
options
`range`	Resegmentation range, 2-elements list consists of minimum and maximum intensity value. Use `"inf"` for infinity
	type	List
`outliers`	Outlier resegmentation algorithm. For now `MEDimage` only implements `"Collewet"` algorithms. Leave empty for no outlier resegmentation
	type	string

e.g.

{
    "imParamMR" : {
        "reSeg" : {
            "range" : [0, "inf"],
            "outliers" : ""
        }
    },
    "imParamCT" : {
        "reSeg" : {
            "range" : [-500, 500],
            "outliers" : "Collewet"
        }
    },
    "imParamPET" : {
        "reSeg" : {
            "range" : [0, "inf"],
            "outliers" : "Collewet"
        }
    }
}

discretisation

Discretization parameters.
type	dict
options
`IH`	Discretization parameters for intensity histogram features
	type	dict
`IVH`	Discretization parameters for intensity volume histogram features
	type	dict
`texture`	Discretization parameters for texture features
	type	dict

IH

Discretization parameters for intensity histogram features.
type	dict
options
`type`	Discretization algorithm: `"FBS"` for fixed bin size and `"FBN"` for fixed bin number algorithm. Other possible options: `"FBSequal"` and `"FBNequal"`
	type	string
`val`	Bin size or bin number, depending on the algorithm used
	type	int

IVH

Discretization parameters for intensity volume histogram features.
type	dict
options
`type`	Discretization algorithm: `"FBS"` for fixed bin size and `"FBN"` for fixed bin number algorithm
	type	string
`val`	Bin size or bin number, depending on the algorithm used
	type	int

texture

Discretization parameters for texture features.
type	dict
options
`type`	List of discretisation algorithms: `"FBS"` for fixed bin size and `"FBN"` for fixed bin number. Texture features will be computed for each algorithm in the list
	type	List[string]
`val`	List of bin sizes or bin numbers, depending on the algorithm used. Texture features will be computed for each bin number or bin size in the list
	type	List[List[int]]

e.g. for CT only (the parameters are the same for MR and PET):

{
    "imParamCT" : {
        "discretisation" : {
            "IH" : {
                "type" : "FBS",
                "val" : 25
            },
            "IVH" : {
                "type" : "FBN",
                "val" : 10
            },
            "texture" : {
                "type" : ["FBS"],
                "val" : [[25]]
            }
        }
    }
}

compute_suv_map

Computation of the suv map for PET scans. Default `True`
type	bool
options
`True`	Will compute suv map for PET scans.
	type	bool
`False`	Will not compute suv map and it must be computed before.
	type	bool

This parameter is only used for PET scans and is set as follows:

{
    "imParamPET" : {
        "compute_suv_map" : true
        }
}

Note

This parameter concern PET scans only. MEDimage only computes suv map for DICOM scans, since the computation relies on DICOM headers for computation and assumes it’s already computed for NIfTI scans.

filter_type

Name of the filter to use on the scan. Empty string by default.
type	string
options
`mean`	Filter images using `mean` filter.
	type	string
`log`	Filter images using `log` filter.
	type	string
`gabor`	Filter images using `gabor` filter.
	type	string
`laws`	Filter images using `laws` filter.
	type	string
`wavelet`	Filter images using `wavelet` filter.
	type	string

e.g.

{
    "imParamPET" : {
        "filter_type" : "mean"
        },
    "imParamMR" : {
        "filter_type" : "laws"
        },
    "imParamCT" : {
        "filter_type" : "log"
        }
}

Extraction Parameters

Extraction parameters are organized in the same wat as the processing parameters so each imaging modality should have its own parameters and the JSON file should be organized as follows:

{
    "imParamMR": {"Extraction params for MR modality"},
    "imParamCT": {"Extraction params for CT modality"},
    "imParamPET": {"Extraction params for PET modality"}
}

glcm dist_correction

glcm features weighting norm. by default `False`
type	Union[bool, str]
options
`manhattan`	Will use `"manhattan"` weighting norm.
	type	string
`euclidean`	Will use `"euclidean"` weighting norm.
	type	string
`chebyshev`	Will use `"chebyshev"` weighting norm.
	type	string
`True`	Will use discretization length difference corrections as used by the Institute of Physics and Engineering in Medicine.
	type	bool
`False`	`False` to replicate IBSI results.
	type	bool

e.g.

{
    "imParamMR" : {
        "glcm" : {
            "dist_correction" : false
        }
    },
    "imParamCT" : {
        "glcm" : {
            "dist_correction" : "chebyshev"
        }
    },
    "imParamPET" : {
        "glcm" : {
            "dist_correction" : "euclidean"
        }
    }
}

glcm merge_method

glcm features aggregation method. by default `"vol_merge"`
type	string
options
`vol_merge`	Features are extracted from a single matrix after merging all 3D directional matrices.
	type	string
`slice_merge`	Features are extracted from a single matrix after merging 2D directional matrices per slice, and then averaged over slices.
	type	string
`dir_merge`	Features are extracted from a single matrix after merging 2D directional matrices per direction, and then averaged over direction
	type	string
`average`	Features are extracted from each 3D directional matrix and averaged over the 3D directions
	type	string

e.g.

{
    "imParamMR" : {
        "glcm" : {
            "merge_method" : "average"
        }
    },
    "imParamCT" : {
        "glcm" : {
            "merge_method" : "vol_merge"
        }
    },
    "imParamPET" : {
        "glcm" : {
            "merge_method" : "dir_merge"
        }
    }
}

glrlm dist_correction

glrlm features weighting norm. by default `False`
type	Union[bool, str]
options
`manhattan`	Will use `"manhattan"` weighting norm.
	type	string
`euclidean`	Will use `"euclidean"` weighting norm.
	type	string
`chebyshev`	Will use `"chebyshev"` weighting norm.
	type	string
`True`	Will use discretization length difference corrections as used by the Institute of Physics and Engineering in Medicine.
	type	bool
`False`	`False` to replicate IBSI results.
	type	bool

e.g.

{
    "imParamMR" : {
        "glrlm" : {
            "dist_correction" : false
        }
    },
    "imParamCT" : {
        "glrlm" : {
            "dist_correction" : "chebyshev"
        }
    },
    "imParamPET" : {
        "glrlm" : {
            "dist_correction" : "euclidean"
        }
    }
}

glrlm merge_method

glrlm features aggregation method. by default `"vol_merge"`
type	string
options
`vol_merge`	Features are extracted from a single matrix after merging all 3D directional matrices.
	type	string
`slice_merge`	Features are extracted from a single matrix after merging 2D directional matrices per slice, and then averaged over slices.
	type	string
`dir_merge`	Features are extracted from a single matrix after merging 2D directional matrices per direction, and then averaged over direction
	type	string
`average`	Features are extracted from each 3D directional matrix and averaged over the 3D directions
	type	string

e.g.

{
    "imParamMR" : {
        "glrlm" : {
            "merge_method" : "average"
        }
    },
    "imParamCT" : {
        "glrlm" : {
            "merge_method" : "vol_merge"
        }
    },
    "imParamPET" : {
        "glrlm" : {
            "merge_method" : "dir_merge"
        }
    }
}

ngtdm dist_correction

ngtdm features weighting norm. by default `False`
type	bool
options
`True`	Will use discretization length difference corrections as used by the Institute of Physics and Engineering in Medicine.
	type	bool
`False`	`False` to replicate IBSI results.
	type	bool

e.g.

{
    "imParamMR" : {
        "ngtdm" : {
            "dist_correction" : false
        }
    },
    "imParamCT" : {
        "ngtdm" : {
            "dist_correction" : true
        }
    },
    "imParamPET" : {
        "ngtdm" : {
            "dist_correction" : true
        }
    }
}

Filtering parameters

Filtering parameters are organized in a separate dictionary, each dictionary contains parameters for every filter of the MEDimage:

{
    "imParamFilter": {
        "mean": {"mean filter params"},
        "log": {"log filter params"},
        "laws": {"laws filter params"},
        "gabor": {"gabor filter params"},
        "wavelet": {"wavelet filter params"},
        "textural": {"textural filter params"}
    }
}

mean

Parameters of the mean filter
type	dict
options
`ndims`	Dimension of the imaging data. Usually 3.
	type	int
`orthogonal_rot`	If `True`, the images will be rotated over all the planes.
	type	bool
`size`	Size of the filter kernel.
	type	int
`padding`	Padding mode, default `"symmetric"`. All the padding modes possible can be found here
	type	string
`name_save`	Saving name added to the end of every radiomics extraction results table (Only if the filter was applied).
	type	string

e.g.

{
    "imParamFilter" : {
        "mean" : {
            "ndims" : 3,
            "orthogonal_rot": false,
            "size" : 5,
            "padding" : "symmetric",
            "name_save" : "mean5"
        }
}

log

Parameters of the laplacian of Gaussian filter
type	dict
options
`ndims`	Dimension of the imaging data. Usually 3.
	type	int
`sigma`	Standard deviation of the Gaussian, controls the scale of the convolutional operator.
	type	float
`orthogonal_rot`	If `True`, the images will be rotated over all the planes.
	type	bool
`padding`	Padding mode, default `"symmetric"`. All the padding modes possible can be found here
	type	string
`name_save`	Saving name added to the end of every radiomics extraction results table (Only if the filter was applied).
	type	string

e.g.

{
    "imParamFilter" : {
        "log" : {
            "ndims" : 3,
            "sigma" : 1.5,
            "orthogonal_rot" : false,
            "padding" : "constant",
            "name_save" : "log_1.5"
        }
}

laws

Parameters of the laws filter
type	dict
options
`config`	List of string of every 1D filter to use for the Laws kernel creation. Possible 1D filters: `"L3"`, `"L5"`, `"E3"`, `"E5"`, `"S3"`, `"S5"`, `"W5"` or `"R5"`
	type	List[str]
`energy_distance`	The Chebyshev distance that will be used to create the laws texture energy image.
	type	float
`rot_invariance`	If `True`, rotational invariance will be approximated.
	type	bool
`orthogonal_rot`	If `True`, the images will be rotated over all the planes.
	type	bool
`energy_image`	If `True`, Laws texture energy images are computed.
	type	bool
`padding`	Padding mode, default `"symmetric"`. All the padding modes possible can be found here
	type	string
`name_save`	Saving name added to the end of every radiomics extraction results table (Only if the filter was applied).
	type	string

e.g.

{
    "imParamFilter" : {
        "laws" : {
            "config" : ["L5", "E5", "E5"],
            "energy_distance" : 7,
            "rot_invariance" : true,
            "orthogonal_rot" : false,
            "energy_image" : true,
            "padding" : "symmetric",
            "name_save" : "laws_l5_e5_e5_7"
        }
}

Note

The order of the 1D filters used in laws filter configuration matter, because we use the configuration list to compute the outer product and the outer product is not commutative.

gabor

Parameters of the gabor filter
type	dict
options
`sigma`	Standard deviation of the Gaussian envelope, controls the scale of the filter.
	type	float
`lambda`	Wavelength or inverse of the frequency.
	type	float
`gamma`	Spatial aspect ratio.
	type	float
`theta`	Angle of the rotation matrix.
	type	str
`rot_invariance`	If `True`, rotational invariance will be approximated by combining the response maps of several elements of the Gabor filter bank.
	type	bool
`orthogonal_rot`	If `True`, the images will be rotated over all the planes.
	type	bool
`padding`	Padding mode, default `"symmetric"`. All the padding modes possible can be found here
	type	string
`name_save`	Saving name added to the end of every radiomics extraction results table (Only if the filter was applied).
	type	string

e.g.

{
    "imParamFilter" : {
        "gabor" : {
            "sigma" : 5,
            "lambda" : 2,
            "gamma" : 1.5,
            "theta" : "Pi/8",
            "rot_invariance" : true,
            "orthogonal_rot" : true,
            "padding" : "symmetric",
            "name_save" : "gabor_5_2_1.5"
        }
}

Note

gamma parameter should be radian but must be specified as a string, for example \(\frac{\pi}{2}\) should be specified as “Pi/2”.

wavelet

Parameters of the gabor filter
type	dict
options
`ndims`	Dimension of the imaging data. Usually 3.
	type	int
`basis_function`	Wavelet name used to create the kernel. The Wavelet families and built-ins can be found here. Custom user wavelets are also supported.
	type	string
`subband`	String of the 1D wavelet kernels (`"H"` for high-pass filter or `"L"` for low-pass filter). Must have a size of `ndims`.
	type	string
`level`	The number of decomposition steps to perform.
	type	int
`rot_invariance`	If `True`, rotational invariance will be approximated.
	type	bool
`padding`	Padding mode, default `"symmetric"`. All the padding modes possible can be found here
	type	string
`name_save`	Saving name added to the end of every radiomics extraction results table (Only if the filter was applied).
	type	string

e.g.

{
    "imParamFilter" : {
        "wavelet" : {
            "ndims" : 3,
            "basis_function" : "db3",
            "subband" : "LLH",
            "level" : 1,
            "rot_invariance" : true,
            "padding" : "symmetric",
            "name_save" : "Wavelet_db3_LLH"
        },
}

textural

Parameters of the textural filter
type	dict
options
`family`	Texture features family. Only `"glcm"` is supported for now.
	type	string
`discretization`	Discretization parameters for the texture features (Defined down below).
	type	dict
`local`	Wether to discretize the ROI locally or globally.
	type	bool
`size`	Filter size.
	type	int
`name_save`	Saving name added to the end of every radiomics extraction results table (Only if the filter was applied).
	type	string

Discretization (Textural filters)

Discretization parameters for intensity histogram features.
type	dict
options
`type`	Discretization algorithm: `"FBS"` for fixed bin size and `"FBN"` for fixed bin number algorithm.
	type	string
`bn`	Bin number. Set if `type` is `"FBN"`.
	type	int
`bw`	Bin size. Set if `type` is `"FBS"` or `type` is `"FBN"` and `adapted` is `True`.
	type	int
`adapted`	If `True`, the bin number will be computed using the bin width and the intensity range. Only valid if `type` is `"FBN"`.
	type	bool

e.g.

{
    "imParamFilter" : {
        "textural" : {
            "family" : "glcm",
            "discretization": {
                "type" : "FBN",
                "bn" : null,
                "bw" : 25,
                "adapted" : true
            },
            "size" : 3,
            "local" : true,
            "name_save" : "glcm_local_fbn_25hu_adapted"
        },
}

Example of a full settings dictionary

Here is an example of a complete settings dictionary:

—

Learning

This section will walk you through the details on how to set up the configuration file for the machine learning part of the pipeline. It will be separated to the following subdivisions:

Experiment Design Parameters

This set of parameters is used to define the experiment design (data splitting, splitting proportion…), it is organized as follows:

{
    "testSets": ["Define method here"],
    "method name": "Define method here"

}

Now let’s specify the parameters for the selected method; for instance, in the case of the Random and CV methods:

Splitting methods

Type of sets to create.
type	object
properties
Random	Random splitting method.
	type	object
	properties
	method	Method of splitting the data.
		type	string
		options
		`SubSampling`	The data will be randomly split
			type	string
		`Institutions`	The data will be split based on institutions
			type	string
	nSplits	Number of splits to create.
		type	int
	stratifyInstitutions	If `True`, the data will be stratified based on institutions.
		type	bool
	testProportion	Proportion of the test set.
		type	float
	seed	Seed for the random number generator.
		type	int
CV	Cross-validation splitting method.
	type	object
	properties
	nFolds	Number of folds to use.
		type	int
	seed	Seed for the random number generator.
		type	int

Example

{
    "Random": {
        "method": "SubSampling",
        "nSplits": 10,
        "stratifyInstitutions": 1,
        "testProportion": 0.33,
        "seed": 54288
    }
}

Data Cleaning Parameters

This set of parameters is used to define the data cleaning process Parameters, it is organized as follows:

{
        "method name": {
        "define parameters here"
    },
    "another method": {
        "define parameters here"
    }
}

Cleaning methods

Feature cleaning method name.
type	object
properties
default	Default cleaning method.
	type	string

Now let’s specify the parameters for the selected cleaning method; for instance, in the case of the default method:

Chosen method’s parameters

Feature cleaning parameters.
type	object
properties
continuous	Continuous feature cleaning parameters.
	type	object
	properties
	missingCutoffps	Maximum percentage cut-offs of missing features per sample. Samples with more missing features than this cut-off will be removed.
		type	float
	covCutoff	Minimal coefficient of variation cut-offs over samples per variable. Variables with less coefficient of variation than this cut-off will be removed.
		type	float
	missingCutoffpf	Maximal percentage cut-offs of missing samples per variable. Features with more missing samples than this cut-off will be removed.
		type	float
	imputation	Imputation method for missing values. Default is `mean`.
		type	string
		options
		`mean`	Impute missing values with the mean of the feature.
			type	string
		`median`	Impute missing values with the median of the feature.
			type	string
		`random`	Impute missing values with the a random value from the feature set.
			type	string

Example

{
    "default":
    {
    "feature": {
                    "continuous": {
                            "missingCutoffps": 0.25,
                            "covCutoff": 0.1,
                            "missingCutoffpf": 0.1,
                            "imputation": "mean"
        }
    }
}

Note

Note that you can add as many methods as you want, for other feature types (categorical, ordinal, etc.) and for other cleaning methods (e.g. PCA).

Data Normalization Parameters

Data normalization aims to remove batch effects from the data. This set of parameters is used to define the data normalization process Parameters, it is organized as follows:

{
    "standardCombat": {
        "define parameters here"
    }
}

Chosen method parameters

Normalization method name.
type	string
options
`standardCombat`	Standard Combat normalization method.
	type	string

Note

For now only the standardCombat method is available and it does not require any parameters.

Feature Set Reduction Parameters

Feature set reduction consists of reducing the number of features in the data by removing correlated features, selecting important features, etc. This set of parameters is used to define the feature set reduction process Parameters, it is organized as follows:

{
    "selected method": {
        "define parameters here"
    }
}

method name

Feature set reduction method name.
type	string
options
`FDA`	False discovery avoidance method. Read the paper.
	type	string
`FDAbalanced`	Balanced version of the False discovery avoidance method, where the selected number of features is the same for each table.
	type	string

Now let’s specify the parameters for the selected feature set reduction method; for instance, in the case of the FDA method:

FDA method

Feature set reduction parameters.
type	object
properties
FDA	FDA method’s parameters.
	type	object
	properties
	nSplits	Number of splits to use for the FDA algorithm.
		type	int
	corrType	Type of correlation to use for the FDA algorithm. Default is `Spearman`.
		type	string
		options
		`Spearman`	Spearman correlation.
			type	string
		`Pearson`	Pearson correlation.
			type	string
	threshStableStart	Stability threshold to cut-off the unstable features at the beginning of the FDA algorithm.
		type	float
	threshInterCorr	Threshold to cut-off the inter-correlated features.
		type	float
	minNfeatStable	Minimum number of stable features to keep before inter-correlation step.
		type	int
	minNfeatInterCorr	Minimum number of inter-correlated features to keep.
		type	int
	minNfeat	Minimum number of features to keep at the end of the FDA algorithm.
		type	int
	seed	Seed for the random number generator.
		type	int

Example

{
    "FDA": {
        "nSplits": 100,
        "corrType": "Spearman",
        "threshStableStart": 0.5,
        "threshInterCorr": 0.7,
        "minNfeatStable": 100,
        "minNfeatInterCorr": 60,
        "minNfeat": 5,
        "seed": 54288
    }
}

Note

Only FDA and FDAbalanced methods are available for now and they share the same parameters.

Machine Learning Parameters

This set of parameters is used to define the machine learning process, algorithm, and parameters, it is organized as follows:

{
    "selected algorithm": {
        "define parameters here"
    }
}

Now let’s specify the parameters for the selected machine learning algorithm; for instance, in the case of the XGBoost algorithm:

ML Algorithm

Machine learning algorithm name.
type	object
properties
XGBoost	XGBoost algorithm.
	type	object
	properties
	varImportanceThreshold	Variable importance threshold. Default is `0.3`. Variables with importance below this threshold will be removed.
		type	float
	optimalThreshold	If `null`, the optimal threshold will be computed. Default is `0.5`.
		type	float
	optimizationMetric	Model’s optimization metric. Default is `AUC`. Only used if `method` is `pycaret`.
		type	string
	method	Method to use for the XGBoost algorithm. Default is `pycaret`.
		type	string
		options
		`pycaret`	Automated using PyCaret.
			type	string
		`random_search`	Random search using a pre-defined grid of parameters.
			type	string
		`grid_search`	Grid search using a pre-defined grid of parameters.
			type	string
	nameSave	Name of the file to save the model.
		type	string
	seed	Seed for the random number generator.
		type	int

Example

{
    "XGBoost": {
        "varImportanceThreshold": 0.3,
        "optimalThreshold": null,
        "optimizationMetric": "AUC",
        "method": "pycaret",
        "nameSave": "XGBoost03AUC",
        "seed": 54288
    }
}

Note

Only the XGBoost algorithm is available for now.

Variables Definition

This set of parameters is used to define the variables to use for the machine learning process, it is organized as follows:

{
    "selected variable": {
        "define parameters here"
    },
    "combinations": [
        "Insert combinations of variables here"
    ]
}

Variables

Variables to use for the machine learning process.
type	object
properties
combinations	List of variables combinations to use for the study.
	type	List[str]

For the selected variable, you can specify the following parameters:

selected variable

Variable name to use for the machine learning process.
type	object
properties
nameType	Type of variable to use. Must contain `Radiomics` for radiomics features.
	type	string
path	Path to the variable file. Use `"setToFolderNameinWorkspace"` to set the features folder to `FolderName` in the workspace.
	type	string
scans	List of scans to use for the variable. For example is `T1C`.
	type	List[str]
rois	List of ROIs to include in the study (will be used to identify the features fie). For example is `GTV`.
	type	List[str]
imSpaces	Radiomics level, the features file must end with this level. For example is `morph`.
	type	List[str]
var_datacleaning	Data cleaning method to use for the variable. Default is `default`.
	type	string
var_normalization	Data normalization method to use for the variable. Default is `combat`.
	type	string
var_fSetReduction	Feature set reduction method to use for the variable. Default is `FDA`.
	type	string

Example

{
    "var1": {
        "nameType": "RadiomicsMorph",
        "path": "setToMyFeaturesInWorkspace",
        "scans": ["T1CE"],
        "rois": ["GTV"],
        "imSpaces": ["morph"],
        "var_datacleaning": "default",
        "var_normalization": "combat",
        "var_fSetReduction": "FDA"
    },
    "combinations": [
        "var1"
    ]
}