GSForge.utils Package¶
utils Package¶
Utility functions for GSForge.
R_interface Module¶
This module contains functions for preparing data for transfer to and from the R programming language.
Such functionality is powered by the rpy2 library.
No conversion function is needed for preparing labels, as one can use the builtin pandas.DataFrame
function to_dataframe.:
label_df = labels.to_dataframe()
-
GSForge.utils.R_interface.Py_counts_to_R(counts: xarray.core.dataarray.DataArray)[source]¶ Prepare a count
xarray.DataArrayas apandas.DataFramefor transfer to the R programming language.This function transposes the data to have genes as rows and samples as columns. It then converts to a
pandas.DataFrameand removes extraneous index levels.The inverse of this function is
R_counts_to_Py_counts.- Parameters
counts – An
xr.DataArraycount matrix.- Returns count_df
A
pandas.DataFrameready to transfer to an R environment.
-
GSForge.utils.R_interface.R_counts_to_Py_counts(r_count_array, original_count_array)[source]¶ Prepares a
numpyarray (count matrix) for use inGSForge.This function transposes the data (so that it has samples as rows and genes as columns).
Inverts the conversion provided by
Py_counts_to_R.- Parameters
r_count_array – A
numpyarray of count values. Presumed to be oriented with genes as rows, and samples as columns.original_count_array – A copy of the original count array from which coordinates will be drawn.
- Returns
An
xarray.DataArrayof the count values.
generate_gem Module¶
wrappers Module¶

Some simple wrappers for maintaining xarray coordinates through sklearn functions.
This module may eventually be supplanted by use of sklearn xarray.
-
class
GSForge.utils.wrappers.train_test_split_wrapper(*args, **params)[source]¶ Bases:
GSForge.models._OperationInterface.OperationInterfacePerforms an
sklearn.preprocessing.train_test_split()call on the subset of data specified by the interface options (the same options passed toget_data().- Returns
x_train, x_test, y_train, y_test
gem= param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)An AnnotatedGEM object.
gene_set_collection= param.ClassSelector(class_=<class ‘GSForge.models._GeneSetCollection.GeneSetCollection’>)A GeneSetCollection object.
selected_gene_sets= param.ListSelector(default=[None], objects=[])A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.
selected_genes= param.Parameter()A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected lineaments (or combinations thereof) will have no effect.
gene_set_mode= param.ObjectSelector(default=’union’, objects=[‘complete’, ‘union’, ‘intersection’])Controls how any selected gene sets are returned by the interface. + complete Returns the entire gene set of the AnnotatedGEM. + union Returns the union of the selected gene sets support. + intersection Returns the intersection of the selected gene sets support.
sample_subset= param.Parameter()A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.
count_variable= param.String()The name of the count matrix used.
annotation_variables= param.Parameter()The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.
count_mask= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘masked’, ‘dropped’])The type of mask to use for the count matrix. + ‘complete’ returns the entire count matrix as numbers. + ‘masked’ returns the entire count matrix with zero or missing as NaN values. + ‘dropped’ returns the count matrix without genes that have zero or missing values.
annotation_mask= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘dropped’])The type of mask to use for the target array. + ‘complete’ returns the entire target array. + ‘masked’ returns the entire target array with zero or missing as NaN values. + ‘dropped’ returns the target array without samples that have zero or missing values.
count_transform= param.Callable()A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.
train_test_split_options= param.Parameter(default={})-
property
active_count_variable¶ Returns the name of the currently active count matrix.
-
debug(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
property
gene_index_name¶ Returns the name of the gene index.
-
get_gene_index(count_variable=None) → numpy.core.multiarray.array¶ Get the currently selected gene index as a numpy array.
- Parameters
count_variable – The variable to be retrieved.
- Returns
A numpy array of the currently selected genes.
-
get_param_values= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
get_sample_index() → numpy.core.multiarray.array¶ Get the currently selected sample index as a numpy array.
- Returns
A numpy array of the currently selected samples.
-
get_selection_indexes() → dict¶ Returns the currently selected indexes as a dictionary.
-
get_value_generator= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
inspect_value= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
instance= functools.partial(<function ParameterizedFunction.instance>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
message(**kwargs)¶ Inspect .param.message method for the full docstring
-
params= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
pprint(imports=None, prefix='\n ', unknown_value='<?>', qualify=False, separator='')¶ Same as Parameterized.pprint, except that X.classname(Y is replaced with X.classname.instance(Y
-
classmethod
print_param_defaults(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
property
sample_index_name¶ Returns the name of the sample index.
-
script_repr(imports=[], prefix=' ')¶ Same as Parameterized.script_repr, except that X.classname(Y is replaced with X.classname.instance(Y
-
property
selection¶ Returns the currently selected data.
-
classmethod
set_default(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
set_param= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
state_pop()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
verbose(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning(**kwargs)¶ Inspect .param.warning method for the full docstring
-
property
x_count_data¶ Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.
- Returns
An Xarray.Dataset selection of the currently active ‘x_data’.
-
property
y_annotation_data¶ Returns the currently selected ‘y_data’, or None, based on the
selected_annotation_variablesparameter.- Returns
An
xarray.Datasetorxarray.DataArrayobject of the currently selected y_data.