GSForge.operations Package¶
operations
Package¶
GSForge operations can be broken down into three categories:
- Analytics
For discrete operations, i.e. chi-squared tests, differential gene expression, etc.
- Normalizations
For those operations that are meant to create an entire transform of the GEM.
- Prospectors
For non-deterministic operations, used in ranking and comparing gene selections.
-
class
GSForge.operations.
get_data
(*args, **params)[source]¶ Bases:
GSForge.models._OperationInterface.OperationInterface
Gets the GEM matrix and an optional annotation column.
gem
= param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)An AnnotatedGEM object.
gene_set_collection
= param.ClassSelector(class_=<class ‘GSForge.models._GeneSetCollection.GeneSetCollection’>)A GeneSetCollection object.
selected_gene_sets
= param.ListSelector(default=[None], objects=[])A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.
selected_genes
= param.Parameter()A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected lineaments (or combinations thereof) will have no effect.
gene_set_mode
= param.ObjectSelector(default=’union’, objects=[‘complete’, ‘union’, ‘intersection’])Controls how any selected gene sets are returned by the interface. + complete Returns the entire gene set of the AnnotatedGEM. + union Returns the union of the selected gene sets support. + intersection Returns the intersection of the selected gene sets support.
sample_subset
= param.Parameter()A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.
count_variable
= param.String()The name of the count matrix used.
annotation_variables
= param.Parameter()The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.
count_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘masked’, ‘dropped’])The type of mask to use for the count matrix. + ‘complete’ returns the entire count matrix as numbers. + ‘masked’ returns the entire count matrix with zero or missing as NaN values. + ‘dropped’ returns the count matrix without genes that have zero or missing values.
annotation_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘dropped’])The type of mask to use for the target array. + ‘complete’ returns the entire target array. + ‘masked’ returns the entire target array with zero or missing as NaN values. + ‘dropped’ returns the target array without samples that have zero or missing values.
count_transform
= param.Callable()A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.
-
property
active_count_variable
¶ Returns the name of the currently active count matrix.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.get_data'>)¶
-
property
gene_index_name
¶ Returns the name of the gene index.
-
get_gene_index
(count_variable=None) → numpy.core.multiarray.array¶ Get the currently selected gene index as a numpy array.
- Parameters
count_variable – The variable to be retrieved.
- Returns
A numpy array of the currently selected genes.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.get_data'>)¶
-
get_sample_index
() → numpy.core.multiarray.array¶ Get the currently selected sample index as a numpy array.
- Returns
A numpy array of the currently selected samples.
-
get_selection_indexes
() → dict¶ Returns the currently selected indexes as a dictionary.
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.get_data'>)¶
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.get_data'>)¶
-
instance
= functools.partial(<function ParameterizedFunction.instance>, <class 'GSForge.operations.get_data'>)¶
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.get_data'>)¶
-
pprint
(imports=None, prefix='\n ', unknown_value='<?>', qualify=False, separator='')¶ Same as Parameterized.pprint, except that X.classname(Y is replaced with X.classname.instance(Y
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
property
sample_index_name
¶ Returns the name of the sample index.
-
script_repr
(imports=[], prefix=' ')¶ Same as Parameterized.script_repr, except that X.classname(Y is replaced with X.classname.instance(Y
-
property
selection
¶ Returns the currently selected data.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.get_data'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.get_data'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
verbose
(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
property
x_count_data
¶ Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.
- Returns
An Xarray.Dataset selection of the currently active ‘x_data’.
-
property
y_annotation_data
¶ Returns the currently selected ‘y_data’, or None, based on the
selected_annotation_variables
parameter.- Returns
An
xarray.Dataset
orxarray.DataArray
object of the currently selected y_data.
analytics
Module¶
Analytics
are intended to more closely rank or compare a GEM subset, rather than the entire GEM.
These functions are intended for analyzing and comparing subsets generated by the functions found in prospectors
.
Methods and notation from [method_compare] used.
- \(LS\)
Learning Sample, \(n\) instances of input-output values.
- \(n\)
Number of input-output value pairs in \(LS\).
- \(m\)
Number of input variables (features or genes) in \(LS\).
- :math X_i
Input array of \(LS\). Ranges from \(i=1, ..., m\).
- \(LS\)
An algorithm that outputs some relevance score, \(s_i\), for each input variable :math X_i.
- method_compare
-
class
GSForge.operations.analytics.
rank_genes_by_model
(*args, **params)[source]¶ Bases:
GSForge.models._OperationInterface.OperationInterface
Given some machine learning model, this operation runs n_iterations and returns a summary dataset of the ranking results.
gem
= param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)An AnnotatedGEM object.
gene_set_collection
= param.ClassSelector(class_=<class ‘GSForge.models._GeneSetCollection.GeneSetCollection’>)A GeneSetCollection object.
selected_gene_sets
= param.ListSelector(default=[None], objects=[])A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.
selected_genes
= param.Parameter()A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected lineaments (or combinations thereof) will have no effect.
gene_set_mode
= param.ObjectSelector(default=’union’, objects=[‘complete’, ‘union’, ‘intersection’])Controls how any selected gene sets are returned by the interface. + complete Returns the entire gene set of the AnnotatedGEM. + union Returns the union of the selected gene sets support. + intersection Returns the intersection of the selected gene sets support.
sample_subset
= param.Parameter()A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.
count_variable
= param.String()The name of the count matrix used.
annotation_variables
= param.Parameter()The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.
count_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘masked’, ‘dropped’])The type of mask to use for the count matrix. + ‘complete’ returns the entire count matrix as numbers. + ‘masked’ returns the entire count matrix with zero or missing as NaN values. + ‘dropped’ returns the count matrix without genes that have zero or missing values.
annotation_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘dropped’])The type of mask to use for the target array. + ‘complete’ returns the entire target array. + ‘masked’ returns the entire target array with zero or missing as NaN values. + ‘dropped’ returns the target array without samples that have zero or missing values.
count_transform
= param.Callable()A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.
model
= param.Parameter()n_iterations
= param.Integer(default=1, inclusive_bounds=(True, True), time_dependent=False, time_fn=Time(label=’Time’, name=’Time00001’, time_type=<class ‘int’>, timestep=1.0, unit=None, until=Infinity()))-
property
active_count_variable
¶ Returns the name of the currently active count matrix.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.rank_genes_by_model'>)¶
-
property
gene_index_name
¶ Returns the name of the gene index.
-
get_gene_index
(count_variable=None) → numpy.core.multiarray.array¶ Get the currently selected gene index as a numpy array.
- Parameters
count_variable – The variable to be retrieved.
- Returns
A numpy array of the currently selected genes.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.rank_genes_by_model'>)¶
-
get_sample_index
() → numpy.core.multiarray.array¶ Get the currently selected sample index as a numpy array.
- Returns
A numpy array of the currently selected samples.
-
get_selection_indexes
() → dict¶ Returns the currently selected indexes as a dictionary.
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.rank_genes_by_model'>)¶
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.rank_genes_by_model'>)¶
-
instance
= functools.partial(<function ParameterizedFunction.instance>, <class 'GSForge.operations.analytics.rank_genes_by_model'>)¶
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.rank_genes_by_model'>)¶
-
pprint
(imports=None, prefix='\n ', unknown_value='<?>', qualify=False, separator='')¶ Same as Parameterized.pprint, except that X.classname(Y is replaced with X.classname.instance(Y
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
property
sample_index_name
¶ Returns the name of the sample index.
-
script_repr
(imports=[], prefix=' ')¶ Same as Parameterized.script_repr, except that X.classname(Y is replaced with X.classname.instance(Y
-
property
selection
¶ Returns the currently selected data.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.rank_genes_by_model'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.rank_genes_by_model'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
verbose
(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
property
x_count_data
¶ Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.
- Returns
An Xarray.Dataset selection of the currently active ‘x_data’.
-
property
y_annotation_data
¶ Returns the currently selected ‘y_data’, or None, based on the
selected_annotation_variables
parameter.- Returns
An
xarray.Dataset
orxarray.DataArray
object of the currently selected y_data.
-
class
GSForge.operations.analytics.
nFDR
(*args, **params)[source]¶ Bases:
GSForge.models._OperationInterface.OperationInterface
nFDR (False Discovery Rate) [method_compare].
nFDR trains two models and compares their
feature_importances_
attributes to estimate the false discovery rate.The FDR estimated is the percent of instances a shuffled output feature has a higher feature importance score than the same non-shuffled feature score.
This is repeated up to
n_iterations
.gem
= param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)An AnnotatedGEM object.
gene_set_collection
= param.ClassSelector(class_=<class ‘GSForge.models._GeneSetCollection.GeneSetCollection’>)A GeneSetCollection object.
selected_gene_sets
= param.ListSelector(default=[None], objects=[])A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.
selected_genes
= param.Parameter()A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected lineaments (or combinations thereof) will have no effect.
gene_set_mode
= param.ObjectSelector(default=’union’, objects=[‘complete’, ‘union’, ‘intersection’])Controls how any selected gene sets are returned by the interface. + complete Returns the entire gene set of the AnnotatedGEM. + union Returns the union of the selected gene sets support. + intersection Returns the intersection of the selected gene sets support.
sample_subset
= param.Parameter()A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.
count_variable
= param.String()The name of the count matrix used.
annotation_variables
= param.Parameter()The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.
count_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘masked’, ‘dropped’])The type of mask to use for the count matrix. + ‘complete’ returns the entire count matrix as numbers. + ‘masked’ returns the entire count matrix with zero or missing as NaN values. + ‘dropped’ returns the count matrix without genes that have zero or missing values.
annotation_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘dropped’])The type of mask to use for the target array. + ‘complete’ returns the entire target array. + ‘masked’ returns the entire target array with zero or missing as NaN values. + ‘dropped’ returns the target array without samples that have zero or missing values.
count_transform
= param.Callable()A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.
model
= param.Parameter()n_iterations
= param.Integer(default=1, inclusive_bounds=(True, True), time_dependent=False, time_fn=Time(label=’Time’, name=’Time00001’, time_type=<class ‘int’>, timestep=1.0, unit=None, until=Infinity()))-
property
active_count_variable
¶ Returns the name of the currently active count matrix.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.nFDR'>)¶
-
property
gene_index_name
¶ Returns the name of the gene index.
-
get_gene_index
(count_variable=None) → numpy.core.multiarray.array¶ Get the currently selected gene index as a numpy array.
- Parameters
count_variable – The variable to be retrieved.
- Returns
A numpy array of the currently selected genes.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.nFDR'>)¶
-
get_sample_index
() → numpy.core.multiarray.array¶ Get the currently selected sample index as a numpy array.
- Returns
A numpy array of the currently selected samples.
-
get_selection_indexes
() → dict¶ Returns the currently selected indexes as a dictionary.
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.nFDR'>)¶
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.nFDR'>)¶
-
instance
= functools.partial(<function ParameterizedFunction.instance>, <class 'GSForge.operations.analytics.nFDR'>)¶
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.nFDR'>)¶
-
pprint
(imports=None, prefix='\n ', unknown_value='<?>', qualify=False, separator='')¶ Same as Parameterized.pprint, except that X.classname(Y is replaced with X.classname.instance(Y
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
property
sample_index_name
¶ Returns the name of the sample index.
-
script_repr
(imports=[], prefix=' ')¶ Same as Parameterized.script_repr, except that X.classname(Y is replaced with X.classname.instance(Y
-
property
selection
¶ Returns the currently selected data.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.nFDR'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.nFDR'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
verbose
(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
property
x_count_data
¶ Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.
- Returns
An Xarray.Dataset selection of the currently active ‘x_data’.
-
property
y_annotation_data
¶ Returns the currently selected ‘y_data’, or None, based on the
selected_annotation_variables
parameter.- Returns
An
xarray.Dataset
orxarray.DataArray
object of the currently selected y_data.
-
class
GSForge.operations.analytics.
mProbes
(*args, **params)[source]¶ Bases:
GSForge.models._OperationInterface.OperationInterface
mProbes [method_compare] works by randomly permuting the feature values in the supplied data. e.g. count values are shuffled within each samples feature (gene) array.
It then ranks the real and shadowed features (for
n_iterations
) with the suppliedmodel
via a call tomodel.fit()
. It then examinesmodel.feature_importances_
for the feature importance values, and then calculates the null rank distribution.This is repeated upto
n_iterations
.gem
= param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)An AnnotatedGEM object.
gene_set_collection
= param.ClassSelector(class_=<class ‘GSForge.models._GeneSetCollection.GeneSetCollection’>)A GeneSetCollection object.
selected_gene_sets
= param.ListSelector(default=[None], objects=[])A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.
selected_genes
= param.Parameter()A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected lineaments (or combinations thereof) will have no effect.
gene_set_mode
= param.ObjectSelector(default=’union’, objects=[‘complete’, ‘union’, ‘intersection’])Controls how any selected gene sets are returned by the interface. + complete Returns the entire gene set of the AnnotatedGEM. + union Returns the union of the selected gene sets support. + intersection Returns the intersection of the selected gene sets support.
sample_subset
= param.Parameter()A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.
count_variable
= param.String()The name of the count matrix used.
annotation_variables
= param.Parameter()The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.
count_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘masked’, ‘dropped’])The type of mask to use for the count matrix. + ‘complete’ returns the entire count matrix as numbers. + ‘masked’ returns the entire count matrix with zero or missing as NaN values. + ‘dropped’ returns the count matrix without genes that have zero or missing values.
annotation_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘dropped’])The type of mask to use for the target array. + ‘complete’ returns the entire target array. + ‘masked’ returns the entire target array with zero or missing as NaN values. + ‘dropped’ returns the target array without samples that have zero or missing values.
count_transform
= param.Callable()A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.
model
= param.Parameter()n_iterations
= param.Integer(default=1, inclusive_bounds=(True, True), time_dependent=False, time_fn=Time(label=’Time’, name=’Time00001’, time_type=<class ‘int’>, timestep=1.0, unit=None, until=Infinity()))-
property
active_count_variable
¶ Returns the name of the currently active count matrix.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.mProbes'>)¶
-
property
gene_index_name
¶ Returns the name of the gene index.
-
get_gene_index
(count_variable=None) → numpy.core.multiarray.array¶ Get the currently selected gene index as a numpy array.
- Parameters
count_variable – The variable to be retrieved.
- Returns
A numpy array of the currently selected genes.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.mProbes'>)¶
-
get_sample_index
() → numpy.core.multiarray.array¶ Get the currently selected sample index as a numpy array.
- Returns
A numpy array of the currently selected samples.
-
get_selection_indexes
() → dict¶ Returns the currently selected indexes as a dictionary.
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.mProbes'>)¶
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.mProbes'>)¶
-
instance
= functools.partial(<function ParameterizedFunction.instance>, <class 'GSForge.operations.analytics.mProbes'>)¶
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.mProbes'>)¶
-
pprint
(imports=None, prefix='\n ', unknown_value='<?>', qualify=False, separator='')¶ Same as Parameterized.pprint, except that X.classname(Y is replaced with X.classname.instance(Y
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
property
sample_index_name
¶ Returns the name of the sample index.
-
script_repr
(imports=[], prefix=' ')¶ Same as Parameterized.script_repr, except that X.classname(Y is replaced with X.classname.instance(Y
-
property
selection
¶ Returns the currently selected data.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.mProbes'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.analytics.mProbes'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
verbose
(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
property
x_count_data
¶ Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.
- Returns
An Xarray.Dataset selection of the currently active ‘x_data’.
-
property
y_annotation_data
¶ Returns the currently selected ‘y_data’, or None, based on the
selected_annotation_variables
parameter.- Returns
An
xarray.Dataset
orxarray.DataArray
object of the currently selected y_data.
normalizations
Module¶
Normalization functions inherit from the OperationInterface
class.
This means that they can all be called upon an AnnotatedGEM
or a GeneSetCollection
.
These (classes) functions have static methods that implement the transform on a numpy
or xarray
source.
-
class
GSForge.operations.normalizations.
ReadsPerKilobaseMillion
(*args, **params)[source]¶ Bases:
GSForge.models._OperationInterface.OperationInterface
RPKM or FPKM – Reads or Fragments per per Kilobase Million.
These methods attempt to compensate for sequencing depth and gene length. The utility of this method is disputed in the literature [cite me].
gem
= param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)An AnnotatedGEM object.
gene_set_collection
= param.ClassSelector(class_=<class ‘GSForge.models._GeneSetCollection.GeneSetCollection’>)A GeneSetCollection object.
selected_gene_sets
= param.ListSelector(default=[None], objects=[])A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.
selected_genes
= param.Parameter()A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected lineaments (or combinations thereof) will have no effect.
gene_set_mode
= param.ObjectSelector(default=’union’, objects=[‘complete’, ‘union’, ‘intersection’])Controls how any selected gene sets are returned by the interface. + complete Returns the entire gene set of the AnnotatedGEM. + union Returns the union of the selected gene sets support. + intersection Returns the intersection of the selected gene sets support.
sample_subset
= param.Parameter()A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.
count_variable
= param.String()The name of the count matrix used.
annotation_variables
= param.Parameter()The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.
count_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘masked’, ‘dropped’])The type of mask to use for the count matrix. + ‘complete’ returns the entire count matrix as numbers. + ‘masked’ returns the entire count matrix with zero or missing as NaN values. + ‘dropped’ returns the count matrix without genes that have zero or missing values.
annotation_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘dropped’])The type of mask to use for the target array. + ‘complete’ returns the entire target array. + ‘masked’ returns the entire target array with zero or missing as NaN values. + ‘dropped’ returns the target array without samples that have zero or missing values.
count_transform
= param.Callable()A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.
length_variable
= param.String(default=’lengths’)-
property
active_count_variable
¶ Returns the name of the currently active count matrix.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.ReadsPerKilobaseMillion'>)¶
-
property
gene_index_name
¶ Returns the name of the gene index.
-
get_gene_index
(count_variable=None) → numpy.core.multiarray.array¶ Get the currently selected gene index as a numpy array.
- Parameters
count_variable – The variable to be retrieved.
- Returns
A numpy array of the currently selected genes.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.ReadsPerKilobaseMillion'>)¶
-
get_sample_index
() → numpy.core.multiarray.array¶ Get the currently selected sample index as a numpy array.
- Returns
A numpy array of the currently selected samples.
-
get_selection_indexes
() → dict¶ Returns the currently selected indexes as a dictionary.
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.ReadsPerKilobaseMillion'>)¶
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.ReadsPerKilobaseMillion'>)¶
-
instance
= functools.partial(<function ParameterizedFunction.instance>, <class 'GSForge.operations.normalizations.ReadsPerKilobaseMillion'>)¶
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.ReadsPerKilobaseMillion'>)¶
-
pprint
(imports=None, prefix='\n ', unknown_value='<?>', qualify=False, separator='')¶ Same as Parameterized.pprint, except that X.classname(Y is replaced with X.classname.instance(Y
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
property
sample_index_name
¶ Returns the name of the sample index.
-
script_repr
(imports=[], prefix=' ')¶ Same as Parameterized.script_repr, except that X.classname(Y is replaced with X.classname.instance(Y
-
property
selection
¶ Returns the currently selected data.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.ReadsPerKilobaseMillion'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.ReadsPerKilobaseMillion'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
verbose
(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
property
x_count_data
¶ Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.
- Returns
An Xarray.Dataset selection of the currently active ‘x_data’.
-
property
y_annotation_data
¶ Returns the currently selected ‘y_data’, or None, based on the
selected_annotation_variables
parameter.- Returns
An
xarray.Dataset
orxarray.DataArray
object of the currently selected y_data.
-
class
GSForge.operations.normalizations.
UpperQuartile
(*args, **params)[source]¶ Bases:
GSForge.models._OperationInterface.OperationInterface
Under this normalization method, after removing genes having zero read counts for all samples, the remaining gene counts are divided by the upper quartile of counts different from zero in the computation of the normalization factors associated with their sample and multiplied by the mean upper quartile across all samples of the dataset. [method_compare]
Original R code.
uq<-function(X){ #excluding zero counts in each sample UQ<-function(y){ quantile(y, 0.75) } X<-X+0.1 upperQ<-apply(X,2,UQ) f.uq<-upperQ/mean(upperQ) upq.res<-scale(X,center=FALSE,scale=f.uq) return(upq.res) }
- method_compare
gem
= param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)An AnnotatedGEM object.
gene_set_collection
= param.ClassSelector(class_=<class ‘GSForge.models._GeneSetCollection.GeneSetCollection’>)A GeneSetCollection object.
selected_gene_sets
= param.ListSelector(default=[None], objects=[])A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.
selected_genes
= param.Parameter()A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected lineaments (or combinations thereof) will have no effect.
gene_set_mode
= param.ObjectSelector(default=’union’, objects=[‘complete’, ‘union’, ‘intersection’])Controls how any selected gene sets are returned by the interface. + complete Returns the entire gene set of the AnnotatedGEM. + union Returns the union of the selected gene sets support. + intersection Returns the intersection of the selected gene sets support.
sample_subset
= param.Parameter()A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.
count_variable
= param.String()The name of the count matrix used.
annotation_variables
= param.Parameter()The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.
count_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘masked’, ‘dropped’])The type of mask to use for the count matrix. + ‘complete’ returns the entire count matrix as numbers. + ‘masked’ returns the entire count matrix with zero or missing as NaN values. + ‘dropped’ returns the count matrix without genes that have zero or missing values.
annotation_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘dropped’])The type of mask to use for the target array. + ‘complete’ returns the entire target array. + ‘masked’ returns the entire target array with zero or missing as NaN values. + ‘dropped’ returns the target array without samples that have zero or missing values.
count_transform
= param.Callable()A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.
-
property
active_count_variable
¶ Returns the name of the currently active count matrix.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.UpperQuartile'>)¶
-
property
gene_index_name
¶ Returns the name of the gene index.
-
get_gene_index
(count_variable=None) → numpy.core.multiarray.array¶ Get the currently selected gene index as a numpy array.
- Parameters
count_variable – The variable to be retrieved.
- Returns
A numpy array of the currently selected genes.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.UpperQuartile'>)¶
-
get_sample_index
() → numpy.core.multiarray.array¶ Get the currently selected sample index as a numpy array.
- Returns
A numpy array of the currently selected samples.
-
get_selection_indexes
() → dict¶ Returns the currently selected indexes as a dictionary.
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.UpperQuartile'>)¶
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.UpperQuartile'>)¶
-
instance
= functools.partial(<function ParameterizedFunction.instance>, <class 'GSForge.operations.normalizations.UpperQuartile'>)¶
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
static
np_upper_quartile
(counts)[source]¶ Perform the upper quartile normalization.
- Parameters
counts – A numpy array containing the raw count values. The shape is assumed to be (samples by genes). Zero counts are expected to be present as zeros.
- Returns
The upper quartile normalized count matrix.
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.UpperQuartile'>)¶
-
pprint
(imports=None, prefix='\n ', unknown_value='<?>', qualify=False, separator='')¶ Same as Parameterized.pprint, except that X.classname(Y is replaced with X.classname.instance(Y
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
process
()[source]¶ Perform the upper quartile normalization.
- Returns
The upper quartile normalized count matrix.
-
property
sample_index_name
¶ Returns the name of the sample index.
-
script_repr
(imports=[], prefix=' ')¶ Same as Parameterized.script_repr, except that X.classname(Y is replaced with X.classname.instance(Y
-
property
selection
¶ Returns the currently selected data.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.UpperQuartile'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.normalizations.UpperQuartile'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
verbose
(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
property
x_count_data
¶ Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.
- Returns
An Xarray.Dataset selection of the currently active ‘x_data’.
-
static
xr_upper_quartile
(counts)[source]¶ Perform the upper quartile normalization.
- Parameters
counts – An
xarray.DataArray
containing the raw count values. The shape is assumed to be (samples by genes). Zero counts are expected to be present as zeros.- Returns
The upper quartile normalized count matrix.
-
property
y_annotation_data
¶ Returns the currently selected ‘y_data’, or None, based on the
selected_annotation_variables
parameter.- Returns
An
xarray.Dataset
orxarray.DataArray
object of the currently selected y_data.
prospectors
Module¶
Prospector
operations return either boolean support arrays or arrays of selected genes.
Prospector operations differ from analytics, in that they are not required to return a ‘result’ for every gene,
or return the same result each call.
-
class
GSForge.operations.prospectors.
create_random_lineament
(*args, **params)[source]¶ Bases:
GSForge.models._OperationInterface.OperationInterface
Creates a random lineament of size
k
.Picks from the gene index defined by the
Interface
options.gem
= param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)An AnnotatedGEM object.
gene_set_collection
= param.ClassSelector(class_=<class ‘GSForge.models._GeneSetCollection.GeneSetCollection’>)A GeneSetCollection object.
selected_gene_sets
= param.ListSelector(default=[None], objects=[])A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.
selected_genes
= param.Parameter()A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected lineaments (or combinations thereof) will have no effect.
gene_set_mode
= param.ObjectSelector(default=’union’, objects=[‘complete’, ‘union’, ‘intersection’])Controls how any selected gene sets are returned by the interface. + complete Returns the entire gene set of the AnnotatedGEM. + union Returns the union of the selected gene sets support. + intersection Returns the intersection of the selected gene sets support.
sample_subset
= param.Parameter()A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.
count_variable
= param.String()The name of the count matrix used.
annotation_variables
= param.Parameter()The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.
count_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘masked’, ‘dropped’])The type of mask to use for the count matrix. + ‘complete’ returns the entire count matrix as numbers. + ‘masked’ returns the entire count matrix with zero or missing as NaN values. + ‘dropped’ returns the count matrix without genes that have zero or missing values.
annotation_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘dropped’])The type of mask to use for the target array. + ‘complete’ returns the entire target array. + ‘masked’ returns the entire target array with zero or missing as NaN values. + ‘dropped’ returns the target array without samples that have zero or missing values.
count_transform
= param.Callable()A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.
k
= param.Integer(default=100, inclusive_bounds=(True, True), time_dependent=False, time_fn=Time(label=’Time’, name=’Time00001’, time_type=<class ‘int’>, timestep=1.0, unit=None, until=Infinity()))-
property
active_count_variable
¶ Returns the name of the currently active count matrix.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.create_random_lineament'>)¶
-
property
gene_index_name
¶ Returns the name of the gene index.
-
get_gene_index
(count_variable=None) → numpy.core.multiarray.array¶ Get the currently selected gene index as a numpy array.
- Parameters
count_variable – The variable to be retrieved.
- Returns
A numpy array of the currently selected genes.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.create_random_lineament'>)¶
-
get_sample_index
() → numpy.core.multiarray.array¶ Get the currently selected sample index as a numpy array.
- Returns
A numpy array of the currently selected samples.
-
get_selection_indexes
() → dict¶ Returns the currently selected indexes as a dictionary.
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.create_random_lineament'>)¶
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.create_random_lineament'>)¶
-
instance
= functools.partial(<function ParameterizedFunction.instance>, <class 'GSForge.operations.prospectors.create_random_lineament'>)¶
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.create_random_lineament'>)¶
-
pprint
(imports=None, prefix='\n ', unknown_value='<?>', qualify=False, separator='')¶ Same as Parameterized.pprint, except that X.classname(Y is replaced with X.classname.instance(Y
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
property
sample_index_name
¶ Returns the name of the sample index.
-
script_repr
(imports=[], prefix=' ')¶ Same as Parameterized.script_repr, except that X.classname(Y is replaced with X.classname.instance(Y
-
property
selection
¶ Returns the currently selected data.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.create_random_lineament'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.create_random_lineament'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
verbose
(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
property
x_count_data
¶ Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.
- Returns
An Xarray.Dataset selection of the currently active ‘x_data’.
-
property
y_annotation_data
¶ Returns the currently selected ‘y_data’, or None, based on the
selected_annotation_variables
parameter.- Returns
An
xarray.Dataset
orxarray.DataArray
object of the currently selected y_data.
-
GSForge.operations.prospectors.
parse_boruta_model
(boruta_model, gene_coords, attrs=None, dim='Gene') → xarray.core.dataset.Dataset[source]¶ Convert a boruta model into an
xarray.Dataset
object.- Parameters
boruta_model – A boruta_py model.
attrs – A dictionary to be assigned to the output dataset attrs.
gene_coords – An array (index) of the genes passed to the boruta_model.
dim – The name of the coordinate dimension.
- Returns
An
xarray.Dataset
object.
-
class
GSForge.operations.prospectors.
boruta_prospector
(*args, **params)[source]¶ Bases:
GSForge.models._OperationInterface.OperationInterface
Runs a single instance of BorutaPy feature selection.
This is just a simple wrapper for a boruta model that produces an
xarray.Dataset
object suitable for use in the creation of aGSForge.GeneSet
object.gem
= param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)An AnnotatedGEM object.
gene_set_collection
= param.ClassSelector(class_=<class ‘GSForge.models._GeneSetCollection.GeneSetCollection’>)A GeneSetCollection object.
selected_gene_sets
= param.ListSelector(default=[None], objects=[])A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.
selected_genes
= param.Parameter()A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected lineaments (or combinations thereof) will have no effect.
gene_set_mode
= param.ObjectSelector(default=’union’, objects=[‘complete’, ‘union’, ‘intersection’])Controls how any selected gene sets are returned by the interface. + complete Returns the entire gene set of the AnnotatedGEM. + union Returns the union of the selected gene sets support. + intersection Returns the intersection of the selected gene sets support.
sample_subset
= param.Parameter()A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.
count_variable
= param.String()The name of the count matrix used.
annotation_variables
= param.Parameter()The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.
count_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘masked’, ‘dropped’])The type of mask to use for the count matrix. + ‘complete’ returns the entire count matrix as numbers. + ‘masked’ returns the entire count matrix with zero or missing as NaN values. + ‘dropped’ returns the count matrix without genes that have zero or missing values.
annotation_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘dropped’])The type of mask to use for the target array. + ‘complete’ returns the entire target array. + ‘masked’ returns the entire target array with zero or missing as NaN values. + ‘dropped’ returns the target array without samples that have zero or missing values.
count_transform
= param.Callable()A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.
estimator
= param.Parameter()A supervised learning estimator, with a ‘fit’ method that returns the
feature_importances_
attribute. Important features must correspond to high absolute values in the feature_importances_.n_estimators
= param.Parameter(default=1000)If int sets the number of estimators in the chosen ensemble method. If ‘auto’ this is determined automatically based on the size of the dataset. The other parameters of the used estimators need to be set with initialisation.
perc
= param.Integer(default=100, inclusive_bounds=(True, True), time_dependent=False, time_fn=Time(label=’Time’, name=’Time00001’, time_type=<class ‘int’>, timestep=1.0, unit=None, until=Infinity()))Instead of the max we use the percentile defined by the user, to pick our threshold for comparison between shadow and real features. The max tend to be too stringent. This provides a finer control over this. The lower perc is the more false positives will be picked as relevant but also the less relevant features will be left out. The usual trade-off. The default is essentially the vanilla Boruta corresponding to the max.
alpha
= param.Number(default=0.05, inclusive_bounds=(True, True), time_dependent=False, time_fn=Time(label=’Time’, name=’Time00001’, time_type=<class ‘int’>, timestep=1.0, unit=None, until=Infinity()))Level at which the corrected p-values will get rejected in both correction steps.
two_step
= param.Boolean(bounds=(0, 1), default=True)If you want to use the original implementation of Boruta with Bonferroni correction only set this to False.
max_iter
= param.Integer(default=100, inclusive_bounds=(True, True), time_dependent=False, time_fn=Time(label=’Time’, name=’Time00001’, time_type=<class ‘int’>, timestep=1.0, unit=None, until=Infinity()))The number of maximum iterations to perform.
random_state
= param.Parameter()If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by
np.random
.verbose
= param.Integer(default=0, inclusive_bounds=(True, True), time_dependent=False, time_fn=Time(label=’Time’, name=’Time00001’, time_type=<class ‘int’>, timestep=1.0, unit=None, until=Infinity()))Controls verbosity of output: - 0: no output - 1: displays iteration number - 2: which features have been selected already
-
property
active_count_variable
¶ Returns the name of the currently active count matrix.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.boruta_prospector'>)¶
-
property
gene_index_name
¶ Returns the name of the gene index.
-
get_gene_index
(count_variable=None) → numpy.core.multiarray.array¶ Get the currently selected gene index as a numpy array.
- Parameters
count_variable – The variable to be retrieved.
- Returns
A numpy array of the currently selected genes.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.boruta_prospector'>)¶
-
get_sample_index
() → numpy.core.multiarray.array¶ Get the currently selected sample index as a numpy array.
- Returns
A numpy array of the currently selected samples.
-
get_selection_indexes
() → dict¶ Returns the currently selected indexes as a dictionary.
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.boruta_prospector'>)¶
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.boruta_prospector'>)¶
-
instance
= functools.partial(<function ParameterizedFunction.instance>, <class 'GSForge.operations.prospectors.boruta_prospector'>)¶
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.boruta_prospector'>)¶
-
pprint
(imports=None, prefix='\n ', unknown_value='<?>', qualify=False, separator='')¶ Same as Parameterized.pprint, except that X.classname(Y is replaced with X.classname.instance(Y
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
property
sample_index_name
¶ Returns the name of the sample index.
-
script_repr
(imports=[], prefix=' ')¶ Same as Parameterized.script_repr, except that X.classname(Y is replaced with X.classname.instance(Y
-
property
selection
¶ Returns the currently selected data.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.boruta_prospector'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.operations.prospectors.boruta_prospector'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
property
x_count_data
¶ Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.
- Returns
An Xarray.Dataset selection of the currently active ‘x_data’.
-
property
y_annotation_data
¶ Returns the currently selected ‘y_data’, or None, based on the
selected_annotation_variables
parameter.- Returns
An
xarray.Dataset
orxarray.DataArray
object of the currently selected y_data.