DGE Log-Fold Change vs MeanΒΆ

A common way to visualize the results of a DGE analysis.

Plotting Guide Setup

A shared setup for all plotting guides.

# OS-independent path management.
from os import environ
from pathlib import Path
import numpy as np
import GSForge as gsf
import holoviews as hv
hv.extension('bokeh')

OSF_PATH = Path(environ.get("GSFORGE_DEMO_DATA", default="~/GSForge_demo_data/")).expanduser().joinpath("osfstorage", "oryza_sativa")
GEM_PATH = OSF_PATH.joinpath("AnnotatedGEMs", "oryza_sativa_hisat2_raw.nc")
TOUR_DGE = OSF_PATH.joinpath("GeneSetCollections", "tour_DGE")
agem = gsf.AnnotatedGEM(GEM_PATH)
agem
<GSForge.AnnotatedGEM>
Name: Oryza Sativa
Selected GEM Variable: 'counts'
    Gene   66338
    Sample 475

Load Differential Gene Expression Analysis Results into a GeneSetCollection

deg_gsc = gsf.GeneSetCollection.from_folder(gem=agem, target_dir=TOUR_DGE, name="DEG Results")
deg_gsc
<GSForge.GeneSetCollection>
DEG Results
GeneSets (9 total): Support Count
    edgeR filter: 21915
    '0 + treatment:genotype'__treatment[HEAT]: 4071
    '0 + treatment:genotype'__treatment[RECOV_HEAT]: 3719
    '0 + treatment:genotype'__treatment[DROUGHT]: 3703
    '0 + treatment:genotype'__treatment[RECOV_DROUGHT]: 2806
    ... and 4 more.

Select a particular result set of interest

deg_gs = deg_gsc.gene_sets["'0 + treatment:genotype'__treatment[HEAT]"]
deg_gs
<GSForge.GeneSet>
Name: '0 + treatment:genotype'__treatment[HEAT]
    Supported Genes:  4071

View the data stored within this GeneSet result

deg_gs.data
<xarray.Dataset>
Dimensions:      (Gene: 88253)
Coordinates:
  * Gene         (Gene) object 'ChrSy.fgenesh.gene.21' ... 'LOC_Os12g44390.1'
Data variables:
    logFC        (Gene) float64 -0.1469 -1.447 -1.385 -1.688 ... nan 0.4425 nan
    logCPM       (Gene) float64 0.3729 -0.4618 3.432 -0.06614 ... nan 4.771 nan
    F            (Gene) float64 0.1518 7.624 12.3 11.13 ... nan nan 26.95 nan
    PValue       (Gene) float64 0.697 0.005992 0.0004969 ... nan 3.146e-07 nan
    FDR          (Gene) float64 0.7511 0.01057 0.001043 ... nan 9.823e-07 nan
    support_dir  (Gene) float64 0.0 0.0 0.0 0.0 1.0 0.0 ... nan nan nan 0.0 nan
    support      (Gene) float64 0.0 0.0 0.0 0.0 1.0 0.0 ... nan nan nan 0.0 nan
Attributes:
    __GSForge.GeneSet.params:  {"gene_index_name": "Gene", "name": "'0 + trea...

Plot gene means vs log-fold changeΒΆ

In some cases we can infer the names of the dimensions, otherwise you will need to pass values to: log_fold_change_var, mean_value_var, p_value_var.

gsf.plots.results.MeanVsLFC(deg_gs)
/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/GSForge/plots/results/_lfc_vs_mean.py:114: RuntimeWarning: invalid value encountered in log10
  "mean": np.log10(data[mean_value_var].values),