GSForge.utils Package¶
utils
Package¶
Utility functions for GSForge.
R_interface
Module¶
This module contains functions for preparing data for transfer to and from the R programming language.
Such functionality is powered by the rpy2 library.
No conversion function is needed for preparing labels, as one can use the builtin pandas.DataFrame
function to_dataframe
.:
label_df = labels.to_dataframe()
-
GSForge.utils.R_interface.
Py_counts_to_R
(counts: xarray.core.dataarray.DataArray)[source]¶ Prepare a count
xarray.DataArray
as apandas.DataFrame
for transfer to the R programming language.This function transposes the data to have genes as rows and samples as columns. It then converts to a
pandas.DataFrame
and removes extraneous index levels.The inverse of this function is
R_counts_to_Py_counts
.- Parameters
counts – An
xr.DataArray
count matrix.- Returns count_df
A
pandas.DataFrame
ready to transfer to an R environment.
-
GSForge.utils.R_interface.
R_counts_to_Py_counts
(r_count_array, original_count_array)[source]¶ Prepares a
numpy
array (count matrix) for use inGSForge
.This function transposes the data (so that it has samples as rows and genes as columns).
Inverts the conversion provided by
Py_counts_to_R
.- Parameters
r_count_array – A
numpy
array of count values. Presumed to be oriented with genes as rows, and samples as columns.original_count_array – A copy of the original count array from which coordinates will be drawn.
- Returns
An
xarray.DataArray
of the count values.
generate_gem
Module¶
wrappers
Module¶
Some simple wrappers for maintaining xarray
coordinates through sklearn
functions.
This module may eventually be supplanted by use of sklearn xarray.
-
class
GSForge.utils.wrappers.
train_test_split_wrapper
(*args, **params)[source]¶ Bases:
GSForge.models._OperationInterface.OperationInterface
Performs an
sklearn.preprocessing.train_test_split()
call on the subset of data specified by the interface options (the same options passed toget_data()
.- Returns
x_train, x_test, y_train, y_test
gem
= param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)An AnnotatedGEM object.
gene_set_collection
= param.ClassSelector(class_=<class ‘GSForge.models._GeneSetCollection.GeneSetCollection’>)A GeneSetCollection object.
selected_gene_sets
= param.ListSelector(default=[None], objects=[])A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.
selected_genes
= param.Parameter()A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected lineaments (or combinations thereof) will have no effect.
gene_set_mode
= param.ObjectSelector(default=’union’, objects=[‘complete’, ‘union’, ‘intersection’])Controls how any selected gene sets are returned by the interface. + complete Returns the entire gene set of the AnnotatedGEM. + union Returns the union of the selected gene sets support. + intersection Returns the intersection of the selected gene sets support.
sample_subset
= param.Parameter()A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.
count_variable
= param.String()The name of the count matrix used.
annotation_variables
= param.Parameter()The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.
count_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘masked’, ‘dropped’])The type of mask to use for the count matrix. + ‘complete’ returns the entire count matrix as numbers. + ‘masked’ returns the entire count matrix with zero or missing as NaN values. + ‘dropped’ returns the count matrix without genes that have zero or missing values.
annotation_mask
= param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘dropped’])The type of mask to use for the target array. + ‘complete’ returns the entire target array. + ‘masked’ returns the entire target array with zero or missing as NaN values. + ‘dropped’ returns the target array without samples that have zero or missing values.
count_transform
= param.Callable()A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.
train_test_split_options
= param.Parameter(default={})-
property
active_count_variable
¶ Returns the name of the currently active count matrix.
-
debug
(**kwargs)¶ Inspect .param.debug method for the full docstring
-
defaults
(**kwargs)¶ Inspect .param.defaults method for the full docstring
-
force_new_dynamic_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
property
gene_index_name
¶ Returns the name of the gene index.
-
get_gene_index
(count_variable=None) → numpy.core.multiarray.array¶ Get the currently selected gene index as a numpy array.
- Parameters
count_variable – The variable to be retrieved.
- Returns
A numpy array of the currently selected genes.
-
get_param_values
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
get_sample_index
() → numpy.core.multiarray.array¶ Get the currently selected sample index as a numpy array.
- Returns
A numpy array of the currently selected samples.
-
get_selection_indexes
() → dict¶ Returns the currently selected indexes as a dictionary.
-
get_value_generator
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
inspect_value
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
instance
= functools.partial(<function ParameterizedFunction.instance>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
message
(**kwargs)¶ Inspect .param.message method for the full docstring
-
params
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
pprint
(imports=None, prefix='\n ', unknown_value='<?>', qualify=False, separator='')¶ Same as Parameterized.pprint, except that X.classname(Y is replaced with X.classname.instance(Y
-
classmethod
print_param_defaults
(*args, **kwargs)¶ Inspect .param.print_param_defaults method for the full docstring
-
print_param_values
(**kwargs)¶ Inspect .param.print_param_values method for the full docstring
-
property
sample_index_name
¶ Returns the name of the sample index.
-
script_repr
(imports=[], prefix=' ')¶ Same as Parameterized.script_repr, except that X.classname(Y is replaced with X.classname.instance(Y
-
property
selection
¶ Returns the currently selected data.
-
classmethod
set_default
(*args, **kwargs)¶ Inspect .param.set_default method for the full docstring
-
set_dynamic_time_fn
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
set_param
= functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)¶
-
state_pop
()¶ Restore the most recently saved state.
See state_push() for more details.
-
state_push
()¶ Save this instance’s state.
For Parameterized instances, this includes the state of dynamically generated values.
Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().
Generally, this method is used by operations that need to test something without permanently altering the objects’ state.
-
verbose
(**kwargs)¶ Inspect .param.verbose method for the full docstring
-
warning
(**kwargs)¶ Inspect .param.warning method for the full docstring
-
property
x_count_data
¶ Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.
- Returns
An Xarray.Dataset selection of the currently active ‘x_data’.
-
property
y_annotation_data
¶ Returns the currently selected ‘y_data’, or None, based on the
selected_annotation_variables
parameter.- Returns
An
xarray.Dataset
orxarray.DataArray
object of the currently selected y_data.