GSForge.models Package¶
models
Package¶
There are two ‘core’ data models in GSForge, both of which store their associated data
in xarray.Dataset
object under a data
attribute. You are encouraged to consult the
xarray documentation
for how to perform any transform or selection not provided by GSForge.
The two ‘core’ data classes are:
- AnnotatedGEM
Contains the gene expression matrix, which is indexed by a ‘Gene’ and ‘Sample’ coordinates. This
xarray.Dataset
object also contains (but is not limited to) phenotype information as well.- GeneSet
A GeneSet is a set of genes and any associated values. A GeneSet can a set of ‘supported’ genes, i.e. genes that are ‘within’ a given GeneSet.
The interface classes provide patterns of data access and common transformations that researchers may need from the core data classes. They are:
- GeneSetCollection
The work-horse of the GSForge package. This object contains an AnnotatedGEM and a python dictionary of {name: GeneSet} objects. This class contains functions for comparing and analyzing GeneSet, as well as tools to pass of GeneSet-derived subsets to other functions.
- Interface
The Interface object provides a common API to interacting with AnnotatedGEM or GeneSetCollection. It provides functions that facilitate pulling gene or sample subsets and access to any transforms of the count matrix.
- OperationInterface
Aside from being abstract, this is the same as the above Interface, except this calls a single function as defined by
process
function in a subclass.
-
class
GSForge.models.
AnnotatedGEM
(*args, **params)[source]¶ Bases:
param.parameterized.Parameterized
A data class for a gene expression matrix and any associated sample or gene annotations.
This model holds the count expression matrix, and any associated labels or annotations as an
xarray.Dataset
object under the.data
attribute. By default this dataset will be expected to have its indexes named “Gene” and “Sample”, although there are parameters to override those arrays and index names used.An AnnotatedGEM object can be created with one of the class methods:
from_files()
A helper function for loading disparate GEM and annotation files through pandas.read_csv().
from_pandas()
Reads in a GEM pandas.DataFrame and an optional annotation DataFrame. These must share the same sample index.
from_netcdf()
Reads in from a .nc filepath. Usually this means loading a previously created AnnotatedGEM.
Randomly generate a demo AnnotatedGEM
# >>> from sklearn.datasets import make_multilabel_classification # >>> data, labels = make_multilabel_classification() # >>> agem = AnnotatedGEM.from_pandas(pd.DataFrame(data), pd.DataFrame(labels), name=”Generated GEM”)
# >>> agem # <GSForge.AnnotatedGEM> # Name: Generated GEM # Selected GEM Variable: ‘counts’ # Gene 100 # Sample 100
View the entire gene or sample index:
# >>> agem.gene_index # <xarray.DataArray ‘Gene’ (Gene: 100)>…
# >>> agem.sample_index # <xarray.DataArray ‘Sample’ (Sample: 100)>…
# >>> agem.infer_variables() # {‘all_labels’: …
data
= param.ClassSelector(class_=<class ‘xarray.core.dataset.Dataset’>)An
xarray.Dataset
object that contains the Gene Expression Matrix, and any needed annotations. This xarray.Dataset object is expected to have a count array named ‘counts’, that has coordinates (‘Gene’, ‘Sample’).count_array_name
= param.String(default=’counts’)This parameter controls which variable from the xarray.Dataset should be considered to be the ‘count’ variable. Consider using this if you require different index names, or wish to control which count array among many should be used by default.
sample_index_name
= param.String(default=’Sample’)This parameter controls which variable from the xarray.Dataset should be considered to be the ‘sample’ coordinate. Consider using this if you require different coordinate names.
gene_index_name
= param.String(default=’Gene’)This parameter controls which variable from the Xarray.Dataset should be considered to be the ‘gene index’ coordinate. Consider using this if you require different coordinate names.
-
property
count_array_names
¶ Returns a list of all available count arrays contained within this AnnotatedGEM object.
This is done simply by returning all data variables that have the same dimension set as the default count array.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._AnnotatedGEM.AnnotatedGEM'>)¶
-
classmethod
from_files
(count_path: str, label_path: str = None, count_kwargs: dict = None, label_kwargs: dict = None, **params)[source]¶ Construct a GEM object from file paths and optional parsing arguments.
- Parameters
count_path – The path to the gene expression matrix.
label_path – The path to the gene annotation data.
count_kwargs – Arguments to be passed to pandas.read_csv for the count matrix.
label_kwargs – Arguments to be passed to pandas.read_csv for the annotations.
- Returns
An instance of the GEM class.
-
classmethod
from_netcdf
(netcdf_path, **params)[source]¶ Construct a
GEM
object from anetcdf
(.nc) file path.- Parameters
netcdf_path – A path to a
netcdf
file. If this file has different index names than default (Gene, Sample, counts), be sure to explicitly set those parameters (gene_index_name
,sample_index_name
,count_array_name
).
-
classmethod
from_pandas
(count_df: pandas.core.frame.DataFrame, label_df: pandas.core.frame.DataFrame = None, **params)[source]¶ Construct a GEM object from pandas.DataFrame objects.
- Parameters
count_df – The gene expression matrix as a pandas.DataFrame. This file is assumed to have genes as rows and samples as columns.
label_df – The gene annotation data as a pandas.DataFrame. This file is assumed to have samples as rows and annotation observations as columns.
- Returns
An instance of the GEM class.
-
property
gene_index
¶ Returns the entire gene index of this AnnotatedGEM object as an
xarray.DataArray
.The actual variable or coordinate that this returns is controlled by the
gene_index_name
parameter.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._AnnotatedGEM.AnnotatedGEM'>)¶
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._AnnotatedGEM.AnnotatedGEM'>)¶
-
infer_variables
(quantile_size=10, skip=None) → dict[source]¶ Infer categories for the variables in the AnnotatedGEM’s labels.
- Parameters
quantile_size – The maximum number of unique elements before a variable is no longer considered as a quantile-able set of values.
skip – The variables to be skipped.
- Returns
A dictionary of the inferred value types.
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._AnnotatedGEM.AnnotatedGEM'>)¶
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._AnnotatedGEM.AnnotatedGEM'>)¶
-
pprint
(imports=None, prefix=' ', unknown_value='<?>', qualify=False, separator='')¶ (Experimental) Pretty printed representation that may be evaluated with eval. See pprint() function for more details.
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
property
sample_index
¶ Returns the entire sample index of this AnnotatedGEM object as an
xarray.DataArray
.The actual variable or coordinate that this returns is controlled by the
sample_index_name
parameter.
-
save
(path)[source]¶ Save as a netcdf (.nc) to the file at path.
- Parameters
path – The filepath to save to. This should use the .nc extension.
- Returns
The path to which the file was saved.
-
script_repr
(imports=[], prefix=' ')¶ Variant of __repr__ designed for generating a runnable script.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._AnnotatedGEM.AnnotatedGEM'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._AnnotatedGEM.AnnotatedGEM'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
verbose
(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
class
GSForge.models.
GeneSet
(*args, **params)[source]¶ Bases:
param.parameterized.Parameterized
A data class for a the result of a gene selection or analysis.
A GeneSet can also be a measurement or ranking of a set of genes, and this could include all of the ‘available’ genes. In such cases a boolean array ‘support’ indicates membership in the GeneSet.
data
= param.Parameter()Contains a gene-index Xarray.Dataset object, it should have only those genes that are considered ‘within’ the GeneSet in the index, or a boolean variable named ‘support’.
support_index_name
= param.String(default=’support’)This parameter controls which variable should be considered to be the (boolean) variable indicating membership in this GeneSet.
gene_index_name
= param.String(default=’Gene’)This parameter controls which variable from the Xarray.Dataset should be considered to be the ‘gene index’ coordinate. Consider using this if you require different coordinate names.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSet.GeneSet'>)¶
-
classmethod
from_GeneSets
(*gene_sets, mode: str = 'union', attrs=None, **params)[source]¶ Create a new GeneSet by combining all the genes in the given GeneSets.
No variables or attributes from the original GeneSets are maintained in this process.
-
classmethod
from_netcdf
(path, **params)[source]¶ Construct a GeneSet object from a netcdf file path.
-
property
gene_index
¶ Returns the entire gene index of this GeneSet object as an
xarray.DataArray
.The variable or coordinate that this returns is controlled by the
gene_index_name
parameter.- Returns
The entire gene index of this GeneSet as an
xarray.DataArray
.
-
gene_support
() → numpy.core.multiarray.array[source]¶ Returns the list of genes ‘supported in this GeneSet.
The value that this return is (by default) controlled by the self.support_index_name parameter.
- Returns
A numpy array of the genes ‘supported’ by this GeneSet.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSet.GeneSet'>)¶
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSet.GeneSet'>)¶
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSet.GeneSet'>)¶
-
k_best_genes
(k=100, score_name=None) → numpy.core.multiarray.array[source]¶ Select the highest scoring genes from the ‘score_name’ variable.
- Parameters
k – The number of genes to return.
score_name – The variable name to rank genes by.
- Returns
A numpy array of the top k genes based on their scores in score_name.
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSet.GeneSet'>)¶
-
static
parse_GeneSets
(*gene_sets, mode: str = 'union', attrs=None, **params)[source]¶ Combines the GeneSet objects given using
mode
to create a single new GeneSet object.Since the complete gene index is not necessarily known, it must minimally be the union of all genes included in the provided gene sets.
- Parameters
gene_sets – One or more
GSForge.GeneSet
objects.mode – Mode by which the gene_sets should be combined. Options are “union” or “intersection”.
attrs – Optional attributes for the combined GeneSet. These attributes are added to the
GeneSet.data.attrs
attribute.params – Keyword parameters for the GeneSet object to be initialized with.
- Returns
A new GeneSet object that contains genes from the provided
gene_sets
.
-
static
parse_pandas
(dataframe, genes=None, attrs=None, **params)[source]¶ Parse a pandas.DataFrame for use in a GeneSet.
-
pprint
(imports=None, prefix=' ', unknown_value='<?>', qualify=False, separator='')¶ (Experimental) Pretty printed representation that may be evaluated with eval. See pprint() function for more details.
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
q_best_genes
(q=0.999, score_name=None) → numpy.core.multiarray.array[source]¶ Returns a numpy array of the q best genes based on the quantile q, and the target variable score_name.
- Parameters
q – The quantile cutoff.
score_name – The target variable to judge the genes by.
- Returns
A numpy array of the top q quantile genes based on score_name.
-
save_as_netcdf
(target_dir=None, name=None)[source]¶ Save this GeneSet as a netcdf (.nc) file in the target_dir directory.
The default filename will be: {GeneSet.name}.nc, if the GeneSet does not have a name, one must be provided via the name argument.
- Parameters
target_dir – The directory to place the saved GeneSet into.
name – The name to give the GeneSet upon saving.
- Returns output_path
The path to which the file was saved.
-
script_repr
(imports=[], prefix=' ')¶ Variant of __repr__ designed for generating a runnable script.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSet.GeneSet'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSet.GeneSet'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
verbose
(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
class
GSForge.models.
GeneSetCollection
(**params)[source]¶ Bases:
param.parameterized.Parameterized
A data class that holds an AnnotatedGEM and a dictionary of associated GeneSet objects.
gem
= param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)A Gene Expression Matrix (GEM) object.
gene_sets
= param.Dict(class_=<class ‘dict’>)A dictionary of {key: xarray.DataArray}, boolean arrays indicating support for a given gene.
-
as_dict
(keys=None, exclude=None)[source]¶ Returns a dictionary of {name: supported_genes} for each gene set, or those specified by the keys argument.
- Parameters
keys – The list of GeneSet keys to be included in the returned dictionary.
exclude – A list of GeneSet keys to exclude from the returned dictionary.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSetCollection.GeneSetCollection'>)¶
-
classmethod
from_folder
(gem, target_dir, glob_filter='*.nc', filter_func=None, **params)[source]¶ Create a CompoundFacet from a list of file paths. The base file names will be used as the key values.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSetCollection.GeneSetCollection'>)¶
-
get_support
(key) → numpy.core.multiarray.array[source]¶ Get the support array for a given key.
- Parameters
key – The GeneSet from which to get the gene support.
- Returns
A numpy array of the genes that make up the support of this GeneSet.
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSetCollection.GeneSetCollection'>)¶
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSetCollection.GeneSetCollection'>)¶
-
intersection
(keys=None, exclude=None)[source]¶ Get the intersection of supported genes in this GeneSet collection.
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
pairwise_percent_intersection
(keys=None)[source]¶ Get the normalized intersection length of each facet combination.
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSetCollection.GeneSetCollection'>)¶
-
pprint
(imports=None, prefix=' ', unknown_value='<?>', qualify=False, separator='')¶ (Experimental) Pretty printed representation that may be evaluated with eval. See pprint() function for more details.
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
save
(target_dir, keys=None)[source]¶ Save this collection to target_dir. Each GeneSet will be saved as a separate .netcdf file within this directory.
- Parameters
target_dir – The path to which the ‘GeneSet’ xarray.Dataset .netcdf files will be written.
keys – The list of GeneSet keys that should be saved. If this is not provided, all GeneSet objects are saved.
- Returns
-
script_repr
(imports=[], prefix=' ')¶ Variant of __repr__ designed for generating a runnable script.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSetCollection.GeneSetCollection'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._GeneSetCollection.GeneSetCollection'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
union
(keys=None, exclude=None)[source]¶ Get the union of supported genes in this GeneSet collection.
-
verbose
(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
class
GSForge.models.
Interface
(*args, **params)[source]¶ Bases:
param.parameterized.Parameterized
The Interface provides common API access for interacting with the AnnotatedGEM and GeneSetCollection objects. It also accepts an AnnotatedGEM and a single GeneSet for subset selection.
For updating default parameters within subclasses, use the following, although it may cause ‘watching’ parameters to fire.
` self.set_param(key=value) `
gem
= param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)An AnnotatedGEM object.
gene_set_collection
= param.ClassSelector(class_=<class ‘GSForge.models._GeneSetCollection.GeneSetCollection’>)A GeneSetCollection object.
selected_gene_sets
= param.ListSelector(default=[None], objects=[])A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.
selected_genes
= param.Parameter()A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected lineaments (or combinations thereof) will have no effect.
gene_set_mode
= param.ObjectSelector(default=’union’, objects=[‘complete’, ‘union’, ‘intersection’])Controls how any selected gene sets are returned by the interface. + complete Returns the entire gene set of the AnnotatedGEM. + union Returns the union of the selected gene sets support. + intersection Returns the intersection of the selected gene sets support.
sample_subset
= param.Parameter()A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.
count_variable
= param.String()The name of the count matrix used.
annotation_variables
= param.Parameter()The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.
count_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘masked’, ‘dropped’])The type of mask to use for the count matrix. + ‘complete’ returns the entire count matrix as numbers. + ‘masked’ returns the entire count matrix with zero or missing as NaN values. + ‘dropped’ returns the count matrix without genes that have zero or missing values.
annotation_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘dropped’])The type of mask to use for the target array. + ‘complete’ returns the entire target array. + ‘masked’ returns the entire target array with zero or missing as NaN values. + ‘dropped’ returns the target array without samples that have zero or missing values.
count_transform
= param.Callable()A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.
-
property
active_count_variable
¶ Returns the name of the currently active count matrix.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._Interface.Interface'>)¶
-
property
gene_index_name
¶ Returns the name of the gene index.
-
get_gene_index
(count_variable=None) → numpy.core.multiarray.array[source]¶ Get the currently selected gene index as a numpy array.
- Parameters
count_variable – The variable to be retrieved.
- Returns
A numpy array of the currently selected genes.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._Interface.Interface'>)¶
-
get_sample_index
() → numpy.core.multiarray.array[source]¶ Get the currently selected sample index as a numpy array.
- Returns
A numpy array of the currently selected samples.
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._Interface.Interface'>)¶
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._Interface.Interface'>)¶
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._Interface.Interface'>)¶
-
pprint
(imports=None, prefix=' ', unknown_value='<?>', qualify=False, separator='')¶ (Experimental) Pretty printed representation that may be evaluated with eval. See pprint() function for more details.
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
property
sample_index_name
¶ Returns the name of the sample index.
-
script_repr
(imports=[], prefix=' ')¶ Variant of __repr__ designed for generating a runnable script.
-
property
selection
¶ Returns the currently selected data.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._Interface.Interface'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._Interface.Interface'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
verbose
(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
property
x_count_data
¶ Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.
- Returns
An Xarray.Dataset selection of the currently active ‘x_data’.
-
property
y_annotation_data
¶ Returns the currently selected ‘y_data’, or None, based on the
selected_annotation_variables
parameter.- Returns
An
xarray.Dataset
orxarray.DataArray
object of the currently selected y_data.
-
class
GSForge.models.
OperationInterface
(*args, **params)[source]¶ Bases:
GSForge.models._Interface.Interface
,param.parameterized.ParameterizedFunction
Abstract class for a GEMOperation.
Every GEMOperation undergoes some argument parsing, then calls self.process(), which must be implemented by implemented classes.
gem
= param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)An AnnotatedGEM object.
gene_set_collection
= param.ClassSelector(class_=<class ‘GSForge.models._GeneSetCollection.GeneSetCollection’>)A GeneSetCollection object.
selected_gene_sets
= param.ListSelector(default=[None], objects=[])A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.
selected_genes
= param.Parameter()A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected lineaments (or combinations thereof) will have no effect.
gene_set_mode
= param.ObjectSelector(default=’union’, objects=[‘complete’, ‘union’, ‘intersection’])Controls how any selected gene sets are returned by the interface. + complete Returns the entire gene set of the AnnotatedGEM. + union Returns the union of the selected gene sets support. + intersection Returns the intersection of the selected gene sets support.
sample_subset
= param.Parameter()A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.
count_variable
= param.String()The name of the count matrix used.
annotation_variables
= param.Parameter()The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.
count_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘masked’, ‘dropped’])The type of mask to use for the count matrix. + ‘complete’ returns the entire count matrix as numbers. + ‘masked’ returns the entire count matrix with zero or missing as NaN values. + ‘dropped’ returns the count matrix without genes that have zero or missing values.
annotation_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘dropped’])The type of mask to use for the target array. + ‘complete’ returns the entire target array. + ‘masked’ returns the entire target array with zero or missing as NaN values. + ‘dropped’ returns the target array without samples that have zero or missing values.
count_transform
= param.Callable()A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.
-
property
active_count_variable
¶ Returns the name of the currently active count matrix.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._OperationInterface.OperationInterface'>)¶
-
property
gene_index_name
¶ Returns the name of the gene index.
-
get_gene_index
(count_variable=None) → numpy.core.multiarray.array¶ Get the currently selected gene index as a numpy array.
- Parameters
count_variable – The variable to be retrieved.
- Returns
A numpy array of the currently selected genes.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._OperationInterface.OperationInterface'>)¶
-
get_sample_index
() → numpy.core.multiarray.array¶ Get the currently selected sample index as a numpy array.
- Returns
A numpy array of the currently selected samples.
-
get_selection_indexes
() → dict¶ Returns the currently selected indexes as a dictionary.
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._OperationInterface.OperationInterface'>)¶
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._OperationInterface.OperationInterface'>)¶
-
instance
= functools.partial(<function ParameterizedFunction.instance>, <class 'GSForge.models._OperationInterface.OperationInterface'>)¶
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._OperationInterface.OperationInterface'>)¶
-
pprint
(imports=None, prefix='\n ', unknown_value='<?>', qualify=False, separator='')¶ Same as Parameterized.pprint, except that X.classname(Y is replaced with X.classname.instance(Y
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
property
sample_index_name
¶ Returns the name of the sample index.
-
script_repr
(imports=[], prefix=' ')¶ Same as Parameterized.script_repr, except that X.classname(Y is replaced with X.classname.instance(Y
-
property
selection
¶ Returns the currently selected data.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._OperationInterface.OperationInterface'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.models._OperationInterface.OperationInterface'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
verbose
(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
property
x_count_data
¶ Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.
- Returns
An Xarray.Dataset selection of the currently active ‘x_data’.
-
property
y_annotation_data
¶ Returns the currently selected ‘y_data’, or None, based on the
selected_annotation_variables
parameter.- Returns
An
xarray.Dataset
orxarray.DataArray
object of the currently selected y_data.