GSForge.operations.normalizations module

Normalization functions inherit from the OperationInterface class. This means that they can all be called upon an AnnotatedGEM or a GeneSetCollection.

These (classes) functions have static methods that implement the transform on a numpy or xarray source.

class GSForge.operations.normalizations.ReadsPerKilobaseMillion(*args, **params)

Bases: GSForge.models._Interface.Interface, param.parameterized.ParameterizedFunction

RPKM or FPKM – Reads or Fragments per per Kilobase Million.

These methods attempt to compensate for sequencing depth and gene length. The utility of this method is disputed in the literature [cite me].

Parameters inherited from:

GSForge.models._Interface.Interface: gem, gene_set_collection, selected_gene_sets, selected_genes, gene_set_mode, sample_subset, count_variable, annotation_variables, count_mask, annotation_mask, count_transform

length_variable = param.String(readonly=False)

length_variable = 'lengths'
static xr_reads_per_kilobase_million(counts, lengths, sample_dim='Sample')
static np_reads_per_kilobase_million(counts, lengths)
name = 'ReadsPerKilobaseMillion'
class GSForge.operations.normalizations.UpperQuartile(*args, **params)

Bases: GSForge.models._Interface.Interface, param.parameterized.ParameterizedFunction

Under this normalization method, after removing genes having zero read counts for all samples, the remaining gene counts are divided by the upper quartile of counts different from zero in the computation of the normalization factors associated with their sample and multiplied by the mean upper quartile across all samples of the dataset. [method_compare]

Original R code.

uq<-function(X){

  #excluding zero counts in each sample
  UQ<-function(y){
    quantile(y, 0.75)
  }
  X<-X+0.1
  upperQ<-apply(X,2,UQ)
  f.uq<-upperQ/mean(upperQ)
  upq.res<-scale(X,center=FALSE,scale=f.uq)
  return(upq.res)
}
method_compare

A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data

Parameters inherited from:

GSForge.models._Interface.Interface: gem, gene_set_collection, selected_gene_sets, selected_genes, gene_set_mode, sample_subset, count_variable, annotation_variables, count_mask, annotation_mask, count_transform

static np_upper_quartile(counts)

Perform the upper quartile normalization.

Parameters

counts – A numpy array containing the raw count values. The shape is assumed to be (samples by genes). Zero counts are expected to be present as zeros.

Returns

The upper quartile normalized count matrix.

static xr_upper_quartile(counts)

Perform the upper quartile normalization.

Parameters

counts – An xarray.DataArray containing the raw count values. The shape is assumed to be (samples by genes). Zero counts are expected to be present as zeros.

Returns

The upper quartile normalized count matrix.

name = 'UpperQuartile'