GSForge.models package¶

Module contents¶

There are two core data models in GSForge, both of which store their associated data in xarray.Dataset object under a data attribute. You are encouraged to consult the xarray documentation for how to perform any transform or selection not provided by GSForge.

Core Data Classes¶

AnnotatedGEM: Contains the gene expression matrix, which is indexed by a ‘Gene’ and ‘Sample’ coordinates. This xarray.Dataset object also contains (but is not limited to) phenotype information as well.
GeneSet: A GeneSet is a set of genes and any associated values. A GeneSet can a set of ‘supported’ genes, i.e. genes that are ‘within’ a given GeneSet.

These core data classes are constructed with a limited set of packages:

numpy
pandas
xarray
param

This allows the creation of container images without interactive visualization libraries.

class GSForge.models.AnnotatedGEM(*args, **params)¶

Bases: param.parameterized.Parameterized

A data class for a gene expression matrix and any associated sample or gene annotations.

This model holds the count expression matrix, and any associated labels or annotations as an xarray.DataSet object under the .data attribute. By default this dataset will be expected to have its indexes named “Gene” and “Sample”, although there are parameters to override those arrays and index names used.

data = param.ClassSelector(readonly=False): An xarray.Dataset object that contains the Gene Expression Matrix, and any needed annotations. This xarray.Dataset object is expected to have a count array named ‘counts’, that has coordinates (‘Gene’, ‘Sample’).
count_array_name = param.String(readonly=False): This parameter controls which variable from the xarray.Dataset should be considered to be the ‘count’ variable. Consider using this if you require different index names, or wish to control which count array among many should be used by default.
sample_index_name = param.String(readonly=False): This parameter controls which variable from the xarray.Dataset should be considered to be the ‘sample’ coordinate. Consider using this if you require different coordinate names.
gene_index_name = param.String(readonly=False): This parameter controls which variable from the xarray.Dataset should be considered to be the ‘gene index’ coordinate. Consider using this if you require different coordinate names.

data = None¶

count_array_name = 'counts'¶

sample_index_name = 'Sample'¶

gene_index_name = 'Gene'¶

property gene_index: xarray.core.dataarray.DataArray¶

Returns the entire gene index of this AnnotatedGEM object as an xarray.DataArray.

The variable or coordinate that this returns is controlled by the gene_index_name parameter.

Returns: The complete gene index of this AnnotatedGEM.
Return type: xarray.DataArray

property sample_index: xarray.core.dataarray.DataArray¶

Returns the entire sample index of this AnnotatedGEM object as an xarray.DataArray.

The actual variable or coordinate that this returns is controlled by the sample_index_name parameter.

Returns: The complete sample index of this AnnotatedGEM.
Return type: xarray.DataArray

property count_array_names: List[str]¶

Returns a list of all available count arrays contained within this AnnotatedGEM object.

This is done simply by returning all data variables that have the same dimension set as the default count array.

Returns: A list of available count arrays in this AnnotatedGEM.
Return type: List[str]

infer_variables(quantile_size: int = 10, skip: Optional[bool] = None) → Dict[str, numpy.ndarray]¶

Infer categories for the variables in the AnnotatedGEM’s labels.

Parameters

quantile_size (int) – The maximum number of unique elements before a variable is no longer considered as a quantile-able set of values.
skip (bool) – The variables to be skipped.

Returns

Return type

A dictionary of the inferred value types.

classmethod from_netcdf(netcdf_path: Union[str, pathlib.Path, IO], **params) → GSForge.models._AnnotatedGEM.AnnotatedGEM¶

Construct an AnnotatedGEM object from a netcdf (.nc) file path.

Parameters: netcdf_path (Union[str, Path, IO[AnyStr]]) – A path to a netcdf file. If this file has different index names than default (Gene, Sample, counts), be sure to explicitly set those parameters (gene_index_name, sample_index_name, count_array_name).
Returns: AnnotatedGEM
Return type: A new instance of the AnnotatedGEM class.

classmethod from_pandas(count_df: pandas.core.frame.DataFrame, label_df: Optional[pandas.core.frame.DataFrame] = None, **params) → GSForge.models._AnnotatedGEM.AnnotatedGEM¶

Reads in a GEM pandas.DataFrame and an optional annotation DataFrame. These must share the same sample index.

Parameters

count_df (pd.DataFrame) – The gene expression matrix as a pandas.DataFrame. This file is assumed to have genes as rows and samples as columns.
label_df (pd.DataFrame) – The gene annotation data as a pandas.DataFrame. This file is assumed to have samples as rows and annotation observations as columns.

Returns

AnnotatedGEM

Return type

A new instance of the AnnotatedGEM class.

static xrarray_gem_from_pandas(count_df: pandas.core.frame.DataFrame, label_df: Optional[pandas.core.frame.DataFrame] = None, transpose_counts: bool = True) → xarray.core.dataset.Dataset¶

Stitch together a gene expression and annotation DataFrames into a single xarray.Dataset object.

Parameters

count_df (pd.DataFrame) – The gene expression matrix as a pandas.DataFrame; assumed to have genes as rows and samples as columns.
label_df (pd.DataFrame) – The gene annotation data as a pandas.DataFrame; assumed to have samples as rows and annotations as columns.
transpose_counts (bool) – Transpose the count matrix from (genes as rows, samples as columns) to (samples as rows, observations as columns).

Returns

xarray.Dataset

Return type

Containing the gene expression matrix and the gene annotation data.

classmethod from_files(count_path: Union[str, pathlib.Path, IO], label_path: Optional[Union[str, pathlib.Path, IO]] = None, count_kwargs: Optional[dict] = None, label_kwargs: Optional[dict] = None, transpose_counts: bool = True, **params) → GSForge.models._AnnotatedGEM.AnnotatedGEM¶

Construct a AnnotatedGEM object from file paths and optional parsing arguments.

Parameters

count_path (Union[str, Path, IO[AnyStr]]) – Path to the gene expression matrix.
label_path (Union[str, Path, IO[AnyStr]]) – Path to the gene annotation data.
count_kwargs (dict) – A dictionary of arguments to be passed to pandas.read_csv for the count matrix.
label_kwargs (dict) – A dictionary of arguments to be passed to pandas.read_csv for the annotations.

Returns

AnnotatedGEM

Return type

A new instance of the AnnotatedGEM class.

classmethod from_geo_id(geo_id: str, destination: str = './') → GSForge.models._AnnotatedGEM.AnnotatedGEM¶

save(path: Union[str, pathlib.Path, IO], **kwargs) → str¶

Save as a netcdf (.nc) to the file at path.

Parameters: path (Union[str, Path, IO[AnyStr]]) – The filepath to save to. This should use the .nc extension.
Returns: str
Return type: The path to which the file was saved.

name = 'AnnotatedGEM'¶

class GSForge.models.GeneSet(*args, **params)¶

Bases: param.parameterized.Parameterized

A data class for a the result of a gene selection or analysis.

A GeneSet can also be a measurement or ranking of a set of genes, and this could include all of the ‘available’ genes. In such cases a boolean array ‘support’ indicates membership in the GeneSet.

Create a GeneSet from a .netcf file path, ``pandas.DataFrame``, ``np.ndarray`` or list of genes:

# Supply any of the above objects along with any other parameters to create a GeneSet.
my_geneset = GeneSet(<pandas.DataFrame, xarray.DataSet, numpy.ndarray, str>)

# One can also explicitly call the constructors for the types above, e.g.:
my_geneset = GeneSet.from_pandas(<pandas.DataFrame>)

Get supported Genes:

my_geneset.get_support()

Set the support with a list or array of genes:

my_geneset.set_support_by_genes(my_genes)

data = param.Parameter(readonly=False): Contains a gene-index xarray.Dataset object, it should have only those genes that are considered ‘within’ the GeneSet in the index, or a boolean variable named ‘support’.
support_index_name = param.String(readonly=False): This parameter controls which variable should be considered to be the (boolean) variable indicating membership in this GeneSet.
gene_index_name = param.String(readonly=False): This parameter controls which variable from the xarray.Dataset should be considered to be the ‘gene index’ coordinate. Consider using this if you require different coordinate names.

data = None¶

support_index_name = 'support'¶

gene_index_name = 'Gene'¶

classmethod from_pandas(dataframe: pandas.core.frame.DataFrame, genes: Optional[numpy.ndarray] = None, attrs=None, **params)¶

Create a GeneSet from a pandas.DataFrame.

Parameters

dataframe (pd.DataFrame) – A pandas.DataFrame object. Assumed to be indexed by genes names.
genes (np.ndarray) – If you have a separate (but ordered the same!) gene array that corresponds to your data, it can be passed here to be set as the index appropriately.
attrs (dict) – A dictionary of attributes to be added to the xarray.Dataset.attrs attribute.
params (dict) – Other parameters to set.

Returns

Return type

A new GeneSet object.

classmethod from_GeneSets(*gene_sets: GSForge.models._GeneSet.GeneSet, mode: str = 'union', attrs=None, **params) → GSForge.models._GeneSet.GeneSet¶

Create a new GeneSet by combining all the genes in the given GeneSets.

No variables or attributes from the original GeneSets are maintained in this process.

Parameters

*gene_sets (GeneSet) – One or more GSForge.GeneSet objects.
mode (str) – Mode by which to combine the given GeneSet objects given.
attrs (dict) – A dictionary of attributes to be added to the xarray.Dataset.attrs attribute.
params (dict) – Other parameters to set.

Returns

GeneSet

Return type

A new GeneSet built from the given GeneSets as described by mode.

classmethod from_bool_array(bool_array: numpy.ndarray, complete_gene_index: numpy.ndarray, attrs=None, **params) → GSForge.models._GeneSet.GeneSet¶

Create a GeneSet object from a boolean support array. This requires a matching gene index array.

Parameters

bool_array (np.ndarray) – A boolean array representing support within this GeneSet.
complete_gene_index (np.ndarray) – The complete gene index.
attrs (dict) – A dictionary of attributes to be added to the xarray.Dataset.attrs attribute.
params (dict) – Other parameters to set.

Returns

GeneSet

Return type

A new GeneSet object.

classmethod from_gene_array(selected_gene_array: numpy.ndarray, complete_gene_index=None, attrs=None, **params) → GSForge.models._GeneSet.GeneSet¶

Parses arguments for a new GeneSet from an array or list of ‘selected’ genes. Such genes are assumed to be within the optionally supplied complete_gene_index.

Parameters

selected_gene_array (np.ndarray) – The genes ‘selected’ to be within the support of this GeneSet.
complete_gene_index (np.ndarray) – Optional. The complete gene index to which those selected genes belong.
attrs (dict) – A dictionary of attributes to be added to the xarray.Dataset.attrs attribute.
params (dict) – Other parameters to set.

Returns

GeneSet

Return type

A new GeneSet object.

classmethod from_xarray_dataset(data: xarray.core.dataset.Dataset, **params) → GSForge.models._GeneSet.GeneSet¶

Create a GeneSet from an xarray.Dataset.

Parameters

data (xr.Dataset) – An xarray.Dataset object. See the .data parameter of this class.
params (dict) – Other parameters to set.

Returns

GeneSet

Return type

A new GeneSet object.

classmethod from_netcdf(path: Union[str, pathlib.Path, IO], **params)¶

Create a GeneSet object from a netcdf file path.

Parameters

path (Union[str, Path, IO[AnyStr]]) – The path to the .netcdf file to be used.
params (dict) – Other parameters to set.

Returns

GeneSet

Return type

A new GeneSet object.

property gene_index: xarray.core.dataarray.DataArray¶

Returns the entire gene index of this GeneSet object as an xarray.DataArray.

The variable or coordinate that this returns is controlled by the gene_index_name parameter.

Returns: xr.DataArray
Return type: A copy of the entire gene index of this GeneSet as an xarray.DataArray.

get_support() → numpy.ndarray¶

Returns the list of genes ‘supported in this GeneSet.

The value that this return is (by default) controlled by the self.support_index_name parameter.

Returns
Return type: A numpy array of the genes ‘supported’ by this GeneSet.

property support_exists: bool¶: Returns True if a support array exists, and that it has at least one member within, returns False otherwise.

set_support_by_genes(genes: numpy.ndarray) → GSForge.models._GeneSet.GeneSet¶

Set this GeneSet support to the given genes. This function calculates the boolean support array for the gene index via np.isin(gene_index, genes). Returns an updated copy of the GeneSet.

Parameters: genes (np.ndarray) – An array of genes which represent the “supported” subset within the entire gene index.
Returns: GeneSet
Return type: Returns an updated copy of the GeneSet.

set_support_from_boolean_array(boolean_array: numpy.ndarray) → GSForge.models._GeneSet.GeneSet¶

Set this GeneSet support based on the given boolean array, which must be the same length as the existing gene index. Returns an updated copy of the GeneSet.

This function calculates the boolean support array for the gene index via np.isin(gene_index, genes).

Parameters: boolean_array (numpy.ndarray) – A boolean numpy.ndarray.
Returns: GeneSet
Return type: Returns an updated copy of the GeneSet.

get_genes_by_threshold(threshold, score_variable: str, comparison: str = 'ge', within_support: bool = True, absolute: bool = True) → numpy.ndarray¶

get_top_n_genes(score_variable: str, n: int = 1000, within_support: bool = True, absolute: bool = True) → numpy.ndarray¶

to_dataframe(only_supported: bool = True) → pandas.core.frame.DataFrame¶

Convert this GeneSet.data attribute to a pandas.DataFrame. This restricts the data returned to include only those genes that are returned by GeneSet.get_support().

Parameters: only_supported (bool) – Defaults to True, set to False if you want all GeneSet data to be in the DataFrame returned.
Returns
Return type: A pandas.DataFrame of this GeneSet.data attribute.

save_as_netcdf(target_dir=None, name=None) → str¶

Save this GeneSet as a netcdf (.nc) file in the target_dir directory.

The default filename will be: {GeneSet.name}.nc, if the GeneSet does not have a name, one must be provided via the name argument.

Parameters

target_dir (str) – The directory to place the saved GeneSet into.
name (str) – The name to give the GeneSet upon saving.

Returns

str

Return type

The path to which the file was saved.

name = 'GeneSet'¶

class GSForge.models.GeneSetCollection(**params)¶

Bases: param.parameterized.Parameterized

An interface class which contains an AnnotatedGEM and a dictionary of GeneSet objects.

gem = param.ClassSelector(readonly=False): A GSForge.AnnotatedGEM object.

gem = None¶

summarize_gene_sets() → Dict[str, int]¶: Summarize this GeneSetCollection, returns a dictionary of {gene_set_name: support_length}. This is used to generate display used in the __repr__ function.

get_support(key: str) → numpy.ndarray¶

Get the support array for a given key.

Parameters: key (str) – The GeneSet from which to get the gene support.
Returns: np.ndarray
Return type: An array of the genes that make up the support of this GeneSet.

gene_sets_to_dataframes(keys: Optional[List[str]] = None, only_supported: bool = True) → Dict[str, pandas.core.frame.DataFrame]¶

Returns a dictionary of {key: pd.DataFrame} of the GeneSet.data. The DataFrame is limited to only those genes that are ‘supported’ within the GeneSet by default.

Parameters

keys (List[str]) – An optional list of gene_set keys to return, by default all keys are selected.
only_supported (bool) – Whether to return a subset defined by each GeneSet support, or the complete data frame.

Returns

dict

Return type

A dictionary of {key: pd.DataFrame} of the GeneSet.data attribute.

gene_sets_to_csv_files(target_dir: Optional[str] = None, keys: Optional[List[str]] = None, only_supported: bool = True) → None¶

Writes GeneSet.data as .csv files.

By default this creates creates a folder with the current working directory and saves the .csv files within. By default only genes that are “supported” by a GeneSet are included.

Parameters

target_dir – The target directory to save the .csv files to. This defaults to the name of this GeneSetCollection, which creates a folder in the current working directory.
keys (List[str]) – An optional list of gene_set keys to return, by default all keys are selected.
only_supported (bool) – Whether to return a subset defined by each GeneSet support, or the complete data frame.

Returns

Return type

None

gene_sets_to_excel_sheet(name: Optional[str] = None, keys: Optional[List[str]] = None, only_supported: bool = True) → None¶

Writes the GeneSet.data within this GeneSetCollection as a single Excel worksheet.

By default this sheet is named using the .name of this GeneSetCollection. By default only genes that are “supported” by a GeneSet are included.

Parameters

name (str) – The name of the Excel sheet. .xlsx will be appended to the given name.
keys (List[str]) – An optional list of gene_set keys to return, by default all keys are selected.
only_supported (bool) – Whether to return a subset defined by each GeneSet support, or the complete data frame.

Returns

Return type

None

as_dict(keys: Optional[List[str]] = None, exclude: Optional[List[str]] = None, empty_supports: bool = False) → Dict[str, numpy.ndarray]¶

Returns a dictionary of {name: supported_genes} for each GeneSet, or those specified by the keys argument.

Parameters

keys (List[str]) – An optional list of gene_set keys to return, by default all keys are selected.
exclude (List[str]) – An optional list of GeneSet keys to exclude from the returned dictionary.
empty_supports – Whether to include GeneSets that have no support array, or no genes supported within the support array.

Returns

dict

Return type

Dictionary of {name: supported_genes} for each GeneSet.

intersection(keys: Optional[List[str]] = None, exclude: Optional[List[str]] = None) → numpy.ndarray¶

Return the intersection of supported genes in this GeneSet collection.

Parameters

keys (List[str]) – An optional list of gene_set keys to return, by default all keys are selected.
exclude (List[str]) – An optional list of GeneSet keys to exclude from the returned dictionary.

Returns

np.ndarray

Return type

Intersection of the supported genes within GeneSets.

union(keys: Optional[List[str]] = None, exclude: Optional[List[str]] = None) → numpy.ndarray¶

Get the union of supported genes in this GeneSet collection.

Parameters

keys (List[str]) – An optional list of gene_set keys to return, by default all keys are selected.
exclude (List[str]) – An optional list of GeneSet keys to exclude from the returned dictionary.

Returns

np.ndarray

Return type

Union of the supported genes within GeneSets.

difference(primary_key: str, other_keys: Optional[List[str]] = None, mode: str = 'union') → numpy.ndarray¶

Finds the genes within primary_key that are not within the mode of the sets given in other_keys.

If no other_keys are provided, all remaining keys are used. The default mode is union.

Parameters

primary_key (List[str]) – The set
other_keys (List[str]) – An optional list of GeneSet keys…
mode (str) – Mode by which to join the GeneSets given by other_keys.

Returns

…

Return type

np.ndarray

joint_difference(primary_keys: List[str], other_keys: Optional[List[str]] = None, primary_join_mode: str = 'union', others_join_mode: str = 'union')¶

Parameters

primary_keys –
other_keys –
primary_join_mode –
others_join_mode –

pairwise_unions(keys: Optional[List[str]] = None, exclude: Optional[List[str]] = None) → Dict[Tuple[str, str], numpy.ndarray]¶

Construct pairwise permutations of GeneSets within this collection, and return the union of each pair in a dictionary.

Parameters

keys (List[str]) – An optional list of gene_set keys to return, by default all keys are selected.
exclude (List[str]) – An optional list of GeneSet keys to exclude from the returned dictionary.

Returns

dict

Return type

A dictionary of {(GeneSet.name, GeneSet.name): gene support union}.

pairwise_intersection(keys: Optional[List[str]] = None, exclude: Optional[List[str]] = None) → Dict[Tuple[str, str], numpy.ndarray]¶

Construct pairwise combinations of GeneSets within this collection, and return the intersection of each pair in a dictionary.

Parameters

keys (List[str]) – An optional list of gene_set keys to return, by default all keys are selected.
exclude (List[str]) – An optional list of GeneSet keys to exclude from the returned dictionary.

Returns

dict

Return type

A dictionary of {GeneSet.Name, GeneSet.name): GeneSets.get_support() intersection}.

pairwise_percent_intersection(keys=None, exclude=None) → List[Tuple[str, str, float]]¶

Construct pairwise permutations of GeneSets within this collection, and return the intersection of each pair within a dictionary.

Parameters

keys (List[str]) – An optional list of gene_set keys to return, by default all keys are selected.
exclude (List[str]) – An optional list of GeneSet keys to exclude from the returned dictionary.

Returns

dict

Return type

A dictionary of {GeneSet.Name, GeneSet.name): percent gene intersection}.

construct_standard_specification(include: Optional[List[str]] = None, exclude=None) → dict¶

Construct a standard specification that can be used to view unions, intersections and differences (unique genes) of the sets within this collection.

Parameters

include (List[str]) – An optional list of gene_set keys to return, by default all keys are selected.
exclude (List[str]) – An optional list of GeneSet keys to exclude from the returned dictionary.

Returns

dict

Return type

A specification dictionary.

static merge_specifications(*specs)¶: Merges sets of defaultdict(list) objects with common keys.

process_set_operation_specification(specification: Optional[dict] = None) → dict¶

Calls and stores the results from a specification. The specification must declare set operation functions and their arguments.

Parameters: specification (Dict) –

classmethod from_specification(source_collection, specification=None, name='processed_specification')¶

classmethod from_folder(gem: GSForge.models._AnnotatedGEM.AnnotatedGEM, target_dir: Union[str, pathlib.Path, IO], glob_filter: str = '*.nc', filter_func: Optional[Callable] = None, **params) → GSForge.models._GeneSetCollection.GeneSetCollection¶

Create a GeneSetCollection from a directory of saved GeneSet objects.

The file name of each gene_set.nc file will be used as the key in the gene_sets dictionary.

Parameters

gem (AnnotatedGEM) – A GSForge.AnnotatedGEM object.
target_dir (Union[str, Path, IO[AnyStr]]) – The directory which contains the saved GeneSet .netcdf files.
glob_filter (str) – A glob by which to restrict the files found within target_dir.
filter_func (Callable) – A function by which to filter which xarray.Dataset objects are included. This function should take an xarray.Dataset and return a boolean.
params – Parameters to configure the GeneSetCollection.

Returns

GeneSetCollection

Return type

A new GeneSetCollection.

save(target_dir: str, keys: Optional[List[str]] = None) → None¶

Save this collection to target_dir. Each GeneSet will be saved as a separate .netcdf file within this directory.

Parameters

target_dir (str) – The path to which GeneSet xarray.Dataset .netcdf files will be written.
keys (List[str]) – The list of GeneSet keys that should be saved. If this is not provided, all GeneSet objects are saved.

Returns

Return type

None

name = 'GeneSetCollection'¶

class GSForge.models.Interface(*args, **params)¶

Bases: param.parameterized.Parameterized

The Interface provides common API access for interacting with the AnnotatedGEM and GeneSetCollection objects.

gem = param.ClassSelector(readonly=False): An AnnotatedGEM object.
gene_set_collection = param.ClassSelector(readonly=False): A GeneSetCollection object.
selected_gene_sets = param.ListSelector(readonly=False): A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.
selected_genes = param.Parameter(readonly=False): A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected GeneSets (or combinations thereof) will have no effect.
gene_set_mode = param.ObjectSelector(readonly=False): Controls how any selected gene sets are returned by the interface. complete Returns the entire gene set of the AnnotatedGEM. union Returns the union of the selected gene sets support. intersection Returns the intersection of the selected gene sets support.
sample_subset = param.Parameter(readonly=False): A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.
count_variable = param.ObjectSelector(readonly=False): The name of the count matrix used.
annotation_variables = param.List(readonly=False): The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.
count_mask = param.ObjectSelector(readonly=False): The type of mask to use for the count matrix. complete Returns the entire count matrix as numbers. masked Returns the entire count matrix with zero or missing as NaN values. dropped Returns the count matrix without genes that have zero or missing values.
annotation_mask = param.ObjectSelector(readonly=False): The type of mask to use for the target array. complete Returns the entire target array. dropped Returns the target array without samples that have zero or missing values.
count_transform = param.Callable(readonly=False): A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.

gem = None¶

gene_set_collection = None¶

selected_gene_sets = [None]¶

selected_genes = None¶

gene_set_mode = 'union'¶

sample_subset = None¶

count_variable = None¶

annotation_variables = [None]¶

count_mask = 'complete'¶

annotation_mask = 'complete'¶

count_transform = None¶

property active_count_variable: str¶: Returns the name of the currently active count matrix.

property gene_index_name: str¶: Returns the name of the gene index.

property sample_index_name: str¶: Returns the name of the sample index.

get_sample_index() → numpy.ndarray¶

Get the currently selected sample index as a numpy array.

Returns: An array of the currently selected samples.
Return type: np.ndarray

property get_selection_indices: dict¶: Returns the currently selected indexes as a dictionary.

property x_count_data: Optional[xarray.core.dataarray.DataArray]¶

Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.

Note: In constructing the a gene index, the count data is constructed first in order to infer coordinate selection based on masking.

Returns: The selection of the currently active count data.
Return type: xarray.Dataset

get_gene_index() → numpy.array¶

Get the currently selected gene index as a numpy array.

Returns: An array of the currently selected genes.
Return type: np.ndarray

property y_annotation_data: Optional[Union[xarray.core.dataset.Dataset, xarray.core.dataarray.DataArray]]¶

Returns the currently selected ‘y_data’, or None, based on the selected_annotation_variables parameter.

Returns
Return type: An xarray.Dataset of the currently selected y_data.

get_gem_data(single_object=False, output_type='xarray', **params)¶

Returns count [and annotation] data based on the current parameters.

Users should call gsf.get_gem_data

name = 'Interface'¶

class GSForge.models.CallableInterface(**kwargs)¶

Bases: GSForge.models._Interface.Interface, param.parameterized.ParameterizedFunction

Parameters inherited from:

GSForge.models._Interface.Interface: gem, gene_set_collection, selected_gene_sets, selected_genes, gene_set_mode, sample_subset, count_variable, annotation_variables, count_mask, annotation_mask, count_transform

name = 'CallableInterface'¶