DGE Log-Fold Change vs MeanΒΆ
A common way to visualize the results of a DGE analysis.
Plotting Guide Setup
A shared setup for all plotting guides.
# OS-independent path management.
from os import environ
from pathlib import Path
import numpy as np
import GSForge as gsf
import holoviews as hv
hv.extension('bokeh')
OSF_PATH = Path(environ.get("GSFORGE_DEMO_DATA", default="~/GSForge_demo_data/")).expanduser().joinpath("osfstorage", "oryza_sativa")
GEM_PATH = OSF_PATH.joinpath("AnnotatedGEMs", "oryza_sativa_hisat2_raw.nc")
TOUR_DGE = OSF_PATH.joinpath("GeneSetCollections", "tour_DGE")
agem = gsf.AnnotatedGEM(GEM_PATH)
agem
<GSForge.AnnotatedGEM>
Name: Oryza Sativa
Selected GEM Variable: 'counts'
Gene 66338
Sample 475
Load Differential Gene Expression Analysis Results into a GeneSetCollection
deg_gsc = gsf.GeneSetCollection.from_folder(gem=agem, target_dir=TOUR_DGE, name="DEG Results")
deg_gsc
<GSForge.GeneSetCollection>
DEG Results
GeneSets (9 total): Support Count
edgeR filter: 21915
'0 + treatment:genotype'__treatment[HEAT]: 4071
'0 + treatment:genotype'__treatment[RECOV_HEAT]: 3719
'0 + treatment:genotype'__treatment[DROUGHT]: 3703
'0 + treatment:genotype'__treatment[RECOV_DROUGHT]: 2806
... and 4 more.
Select a particular result set of interest
deg_gs = deg_gsc.gene_sets["'0 + treatment:genotype'__treatment[HEAT]"]
deg_gs
<GSForge.GeneSet>
Name: '0 + treatment:genotype'__treatment[HEAT]
Supported Genes: 4071
View the data stored within this GeneSet result
deg_gs.data
<xarray.Dataset> Dimensions: (Gene: 88253) Coordinates: * Gene (Gene) object 'ChrSy.fgenesh.gene.21' ... 'LOC_Os12g44390.1' Data variables: logFC (Gene) float64 -0.1469 -1.447 -1.385 -1.688 ... nan 0.4425 nan logCPM (Gene) float64 0.3729 -0.4618 3.432 -0.06614 ... nan 4.771 nan F (Gene) float64 0.1518 7.624 12.3 11.13 ... nan nan 26.95 nan PValue (Gene) float64 0.697 0.005992 0.0004969 ... nan 3.146e-07 nan FDR (Gene) float64 0.7511 0.01057 0.001043 ... nan 9.823e-07 nan support_dir (Gene) float64 0.0 0.0 0.0 0.0 1.0 0.0 ... nan nan nan 0.0 nan support (Gene) float64 0.0 0.0 0.0 0.0 1.0 0.0 ... nan nan nan 0.0 nan Attributes: __GSForge.GeneSet.params: {"gene_index_name": "Gene", "name": "'0 + trea...
xarray.Dataset
- Gene: 88253
- Gene(Gene)object'ChrSy.fgenesh.gene.21' ... 'LOC...
array(['ChrSy.fgenesh.gene.21', 'ChrSy.fgenesh.gene.25', 'ChrSy.fgenesh.gene.28', ..., 'LOC_Os12g44380.3', 'LOC_Os12g44390', 'LOC_Os12g44390.1'], dtype=object)
- logFC(Gene)float64-0.1469 -1.447 ... 0.4425 nan
array([-0.14694167, -1.44709568, -1.38451122, ..., nan, 0.4425264 , nan])
- logCPM(Gene)float640.3729 -0.4618 3.432 ... 4.771 nan
array([ 0.37293186, -0.46181377, 3.43178692, ..., nan, 4.77067922, nan])
- F(Gene)float640.1518 7.624 12.3 ... nan 26.95 nan
array([ 0.15183887, 7.62365392, 12.30217007, ..., nan, 26.94745963, nan])
- PValue(Gene)float640.697 0.005992 ... 3.146e-07 nan
array([6.96964884e-01, 5.99154058e-03, 4.96947782e-04, ..., nan, 3.14582998e-07, nan])
- FDR(Gene)float640.7511 0.01057 ... 9.823e-07 nan
array([7.51081109e-01, 1.05711788e-02, 1.04336182e-03, ..., nan, 9.82343460e-07, nan])
- support_dir(Gene)float640.0 0.0 0.0 0.0 ... nan nan 0.0 nan
array([ 0., 0., 0., ..., nan, 0., nan])
- support(Gene)float640.0 0.0 0.0 0.0 ... nan nan 0.0 nan
array([ 0., 0., 0., ..., nan, 0., nan])
- __GSForge.GeneSet.params :
- {"gene_index_name": "Gene", "name": "'0 + treatment:genotype'__treatment[HEAT]", "support_index_name": "support"}
Plot gene means vs log-fold changeΒΆ
In some cases we can infer the names of the dimensions, otherwise you will need to pass values to: log_fold_change_var
, mean_value_var
, p_value_var
.
gsf.plots.results.MeanVsLFC(deg_gs)
/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/GSForge/plots/results/_lfc_vs_mean.py:114: RuntimeWarning: invalid value encountered in log10
"mean": np.log10(data[mean_value_var].values),