GSForge.operations.prospectors module

Prospector operations return either boolean support arrays or arrays of selected genes. Prospector operations differ from analytics, in that they are not required to return a ‘result’ for every gene, or return the same result each call.

GSForge.operations.prospectors.parse_boruta_model(boruta_model, gene_coords, attrs=None, dim='Gene') xarray.core.dataset.Dataset

Convert a boruta model into an xarray.Dataset object.

Parameters
  • boruta_model – A boruta_py model.

  • attrs – A dictionary to be assigned to the output dataset attrs.

  • gene_coords – An array (index) of the genes passed to the boruta_model.

  • dim – The name of the coordinate dimension.

Returns

An xarray.Dataset object.

class GSForge.operations.prospectors.BorutaProspector(*args, **params)

Bases: GSForge.models._Interface.Interface, param.parameterized.ParameterizedFunction

Runs a single instance of BorutaPy feature selection.

This is just a simple wrapper for a boruta model that produces an xarray.Dataset object suitable for use in the creation of a GSForge.GeneSet object.

Parameters inherited from:

GSForge.models._Interface.Interface: gem, gene_set_collection, selected_gene_sets, selected_genes, gene_set_mode, sample_subset, count_variable, annotation_variables, count_mask, annotation_mask, count_transform

estimator = param.Parameter(readonly=False)

A supervised learning estimator, with a ‘fit’ method that returns the feature_importances_ attribute. Important features must correspond to high absolute values in the feature_importances_.

n_estimators = param.Parameter(readonly=False)

If int sets the number of estimators in the chosen ensemble method. If ‘auto’ this is determined automatically based on the size of the dataset. The other parameters of the used estimators need to be set with initialisation.

perc = param.Integer(readonly=False)

Instead of the max we use the percentile defined by the user, to pick our threshold for comparison between shadow and real features. The max tend to be too stringent. This provides a finer control over this. The lower perc is the more false positives will be picked as relevant but also the less relevant features will be left out. The usual trade-off. The default is essentially the vanilla Boruta corresponding to the max.

alpha = param.Number(readonly=False)

Level at which the corrected p-values will get rejected in both correction steps.

two_step = param.Boolean(readonly=False)

If you want to use the original implementation of Boruta with Bonferroni correction only set this to False.

max_iter = param.Integer(readonly=False)

The number of maximum iterations to perform.

random_state = param.Parameter(readonly=False)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

verbose = param.Integer(readonly=False)

Controls verbosity of output: - 0: no output - 1: displays iteration number - 2: which features have been selected already

estimator = None
n_estimators = 1000
perc = 100
alpha = 0.05
two_step = True
max_iter = 100
random_state = None
verbose = 0
name = 'BorutaProspector'
class GSForge.operations.prospectors.ChiSquaredTest(*args, **params)

Bases: GSForge.models._Interface.Interface, param.parameterized.ParameterizedFunction

Compute chi-squared stats between each non-negative feature and class. See the Scikit-learn documentation

Parameters inherited from:

GSForge.models._Interface.Interface: gem, gene_set_collection, selected_gene_sets, selected_genes, gene_set_mode, sample_subset, count_variable, annotation_variables, count_mask, annotation_mask, count_transform

name = 'ChiSquaredTest'
class GSForge.operations.prospectors.FClassificationTest(*args, **params)

Bases: GSForge.models._Interface.Interface, param.parameterized.ParameterizedFunction

Compute the ANOVA F-value for the provided sample. See the Scikit-learn documentation

Parameters inherited from:

GSForge.models._Interface.Interface: gem, gene_set_collection, selected_gene_sets, selected_genes, gene_set_mode, sample_subset, count_variable, annotation_variables, count_mask, annotation_mask, count_transform

name = 'FClassificationTest'