GSForge.utils Package


utils Package

Utility functions for GSForge.


R_interface Module

This module contains functions for preparing data for transfer to and from the R programming language.

Such functionality is powered by the rpy2 library.

No conversion function is needed for preparing labels, as one can use the builtin pandas.DataFrame function to_dataframe.:

label_df = labels.to_dataframe()
GSForge.utils.R_interface.Py_counts_to_R(counts: xarray.core.dataarray.DataArray)[source]

Prepare a count xarray.DataArray as a pandas.DataFrame for transfer to the R programming language.

This function transposes the data to have genes as rows and samples as columns. It then converts to a pandas.DataFrame and removes extraneous index levels.

The inverse of this function is R_counts_to_Py_counts.

Parameters

counts – An xr.DataArray count matrix.

Returns count_df

A pandas.DataFrame ready to transfer to an R environment.

GSForge.utils.R_interface.R_counts_to_Py_counts(r_count_array, original_count_array)[source]

Prepares a numpy array (count matrix) for use in GSForge.

This function transposes the data (so that it has samples as rows and genes as columns).

Inverts the conversion provided by Py_counts_to_R.

Parameters
  • r_count_array – A numpy array of count values. Presumed to be oriented with genes as rows, and samples as columns.

  • original_count_array – A copy of the original count array from which coordinates will be drawn.

Returns

An xarray.DataArray of the count values.


generate_gem Module


wrappers Module

Inheritance diagram of GSForge.utils.wrappers

Some simple wrappers for maintaining xarray coordinates through sklearn functions.

This module may eventually be supplanted by use of sklearn xarray.

class GSForge.utils.wrappers.train_test_split_wrapper(*args, **params)[source]

Bases: GSForge.models._OperationInterface.OperationInterface

Performs an sklearn.preprocessing.train_test_split() call on the subset of data specified by the interface options (the same options passed to get_data().

Returns

x_train, x_test, y_train, y_test

gem = param.ClassSelector(class_=<class ‘GSForge.models._AnnotatedGEM.AnnotatedGEM’>)

An AnnotatedGEM object.

gene_set_collection = param.ClassSelector(class_=<class ‘GSForge.models._GeneSetCollection.GeneSetCollection’>)

A GeneSetCollection object.

selected_gene_sets = param.ListSelector(default=[None], objects=[])

A list of keys from the provided GeneSetCollection (stored in gene_set_collection) that are to be used for selecting sets of genes from the count matrix.

selected_genes = param.Parameter()

A list of genes to use in indexing from the count matrix. This parameter takes priority over all other gene selecting methods. That means that selected lineaments (or combinations thereof) will have no effect.

gene_set_mode = param.ObjectSelector(default=’union’, objects=[‘complete’, ‘union’, ‘intersection’])

Controls how any selected gene sets are returned by the interface. + complete Returns the entire gene set of the AnnotatedGEM. + union Returns the union of the selected gene sets support. + intersection Returns the intersection of the selected gene sets support.

sample_subset = param.Parameter()

A list of samples to use in a given operation. These can be supplied directly as a list of genes, or can be drawn from a given GeneSet.

count_variable = param.String()

The name of the count matrix used.

annotation_variables = param.Parameter()

The name of the active annotation variable(s). These are the annotation columns that will be control the subset returned by y_annotation_data.

count_mask = param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘masked’, ‘dropped’])

The type of mask to use for the count matrix. + ‘complete’ returns the entire count matrix as numbers. + ‘masked’ returns the entire count matrix with zero or missing as NaN values. + ‘dropped’ returns the count matrix without genes that have zero or missing values.

annotation_mask = param.ObjectSelector(default=’complete’, objects=[‘complete’, ‘dropped’])

The type of mask to use for the target array. + ‘complete’ returns the entire target array. + ‘masked’ returns the entire target array with zero or missing as NaN values. + ‘dropped’ returns the target array without samples that have zero or missing values.

count_transform = param.Callable()

A transform that will be run on the x_data that is supplied by this Interface. The transform runs on the subset of the matrix that has been selected.

train_test_split_options = param.Parameter(default={})

property active_count_variable

Returns the name of the currently active count matrix.

debug(**kwargs)

Inspect .param.debug method for the full docstring

defaults(**kwargs)

Inspect .param.defaults method for the full docstring

force_new_dynamic_value = functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)
property gene_index_name

Returns the name of the gene index.

get_gene_index(count_variable=None) → numpy.core.multiarray.array

Get the currently selected gene index as a numpy array.

Parameters

count_variable – The variable to be retrieved.

Returns

A numpy array of the currently selected genes.

get_param_values = functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)
get_sample_index() → numpy.core.multiarray.array

Get the currently selected sample index as a numpy array.

Returns

A numpy array of the currently selected samples.

get_selection_indexes() → dict

Returns the currently selected indexes as a dictionary.

get_value_generator = functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)
inspect_value = functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)
instance = functools.partial(<function ParameterizedFunction.instance>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)
message(**kwargs)

Inspect .param.message method for the full docstring

params = functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)
pprint(imports=None, prefix='\n ', unknown_value='<?>', qualify=False, separator='')

Same as Parameterized.pprint, except that X.classname(Y is replaced with X.classname.instance(Y

classmethod print_param_defaults(*args, **kwargs)

Inspect .param.print_param_defaults method for the full docstring

print_param_values(**kwargs)

Inspect .param.print_param_values method for the full docstring

process()[source]

Abstract process.

property sample_index_name

Returns the name of the sample index.

script_repr(imports=[], prefix=' ')

Same as Parameterized.script_repr, except that X.classname(Y is replaced with X.classname.instance(Y

property selection

Returns the currently selected data.

classmethod set_default(*args, **kwargs)

Inspect .param.set_default method for the full docstring

set_dynamic_time_fn = functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)
set_param = functools.partial(<function Parameters.deprecate.<locals>.inner>, <class 'GSForge.utils.wrappers.train_test_split_wrapper'>)
state_pop()

Restore the most recently saved state.

See state_push() for more details.

state_push()

Save this instance’s state.

For Parameterized instances, this includes the state of dynamically generated values.

Subclasses that maintain short-term state should additionally save and restore that state using state_push() and state_pop().

Generally, this method is used by operations that need to test something without permanently altering the objects’ state.

verbose(**kwargs)

Inspect .param.verbose method for the full docstring

warning(**kwargs)

Inspect .param.warning method for the full docstring

property x_count_data

Returns the currently selected ‘x_data’. Usually this will be a subset of the active count array.

Returns

An Xarray.Dataset selection of the currently active ‘x_data’.

property y_annotation_data

Returns the currently selected ‘y_data’, or None, based on the selected_annotation_variables parameter.

Returns

An xarray.Dataset or xarray.DataArray object of the currently selected y_data.