GEM Normalization¶
This notebook is a how-to guide on normalizing gene expression matrice using GEMprospector. It does not cover considerations as to which normalization should be preformed.
Setting up the notebook
In [1]:
import os
import GSForge as gsf
from pathlib import Path
import numpy as np
import holoviews as hv
hv.extension("bokeh")
Declare paths used
In [2]:
# OS-independent path management.
from os import fspath, environ
from pathlib import Path
In [3]:
OSF_PATH = Path(environ.get("GSFORGE_DEMO_DATA", default="~/GSForge_demo_data")).expanduser()
AGEM_PATH = OSF_PATH.joinpath("osfstorage", "rice.nc")
assert AGEM_PATH.exists()
Load an AnnotatedGEM
In [4]:
agem = gsf.AnnotatedGEM(AGEM_PATH)
agem
Out[4]:
Saving Normalizations to the AnnotatedGEM¶
If a normalization is expensive to compute it can be worth saving to the AnnoatedGEM
object.
In [5]:
uq_counts = gsf.operations.UpperQuartile(agem)
agem.data["uq_counts"] = uq_counts
agem.data
Out[5]:
Save the AnnotatedGEM as a .netcdf file¶
In [6]:
# agem.save(AGEM_PATH)
Viewing the effect of transforms and normalizations.
In [7]:
gsf.plots.ScatterDistributionBase(agem, count_variable="uq_counts", datashade_=True).opts(
hv.opts.Area(bgcolor="lightgrey", show_grid=True, show_legend=False, alpha=0.25),
hv.opts.Area("dist_x", width=150),
hv.opts.Area("dist_y", height=150),
hv.opts.RGB(width=500, height=500, bgcolor="lightgrey", show_grid=True),
)
Out[7]:
In [8]:
gsf.plots.ScatterDistributionBase(agem, count_variable="counts", datashade_=True).opts(
hv.opts.Area(bgcolor="lightgrey", show_grid=True, show_legend=False, alpha=0.25),
hv.opts.Area("dist_x", width=150),
hv.opts.Area("dist_y", height=150),
hv.opts.RGB(width=500, height=500, bgcolor="lightgrey", show_grid=True),
)
Out[8]: