Using EdgeR to create Normalizations¶

This notebook provides an example of using data selected from GSForge in an R session, and returning the results back to Python to store and view the results with GSForge.

import xarray as xr
import GSForge as gsf
import holoviews as hv
hv.extension('bokeh')

R integration setup

import rpy2.rinterface_lib.callbacks
import logging
from rpy2.robjects import pandas2ri
%load_ext rpy2.ipython
pandas2ri.activate()
rpy2.rinterface_lib.callbacks.logger.setLevel(logging.ERROR) # Supresses verbose R output.

%%R
library("edgeR")

Declare paths used

# OS-independent path management.
from os import fspath, environ
from pathlib import Path
OSF_PATH = Path(environ.get("GSFORGE_DEMO_DATA", default="~/GSForge_demo_data/")).expanduser().joinpath("osfstorage", "oryza_sativa")
GEM_PATH = OSF_PATH.joinpath("AnnotatedGEMs", "oryza_sativa_hisat2_raw.nc")

Load an AnnotatedGEM

agem = gsf.AnnotatedGEM(GEM_PATH)
agem

Select counts using get_gem_data()

counts, _ = gsf.get_gem_data(agem)

Prepare the counts for R

Notice the counts are transposed after this step to the form more common in R. (features by samples).

ri_counts = gsf.utils.R_interface.Py_counts_to_R(counts)
ri_counts.shape

Run the normalization within R

%%R -i ri_counts -o tmm_counts

dge_list <- DGEList(counts=ri_counts)
dge_list <- calcNormFactors(dge_list, method="TMM")
tmm_counts <- cpm(dge_list, normalized.lib.sizes=TRUE, log=FALSE)

tmm_counts = xr.DataArray(tmm_counts.T, coords=counts.coords, name='tmm_counts')
tmm_counts

Add the counts to the GEM .data attribute.

agem.data['tmm_counts'] = tmm_counts