Active subnetwork detection Quick Start
Install and Load the wActNet
Package
Begin by installing and loading the devtools
package.
Use the install_github function to install wActNet
from
GitHub
as shown below. Our package depends on two
additional R packages, igraph
and BioNet
, and
the second one can be downloaded from BioConductor
https://www.bioconductor.org/install/.
#please comment the below code out if you already install the package-wActNet from github
#library(devtools)
#if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("pathview")
#install_github("yuy113/wActNet")
#BiocManager::install(c("BioNet"))
#foreign, MASS, nlme, survival
library(wActNet)
#R package BioNet from BioConductor
library(BioNet)
## Loading required package: graph
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
## tapply, union, unique, unsplit, which.max, which.min
## Loading required package: RBGL
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:RBGL':
##
## bfs, dfs, transitivity
## The following objects are masked from 'package:graph':
##
## degree, edges, intersection, union
## The following objects are masked from 'package:BiocGenerics':
##
## normalize, path, union
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
Simulated dataset
Simulate data and generate P values of tests on pairwise correlations
To illustrate the process, we simulate data (\(n\) observations, \(p\) features) from a multivariate normal
distribution, which is then used to generate data on \({p}\choose{2}\) pairwise correlations data
through the simulation of random data, which follows a multivariate
normal distribution. Subsequently, we use the function
pval.perm.corr()
to conduct multiple nonparametric
permutation tests for all the pairwise correlations within the simulated
data. Using the function and obtaining corresponding P values, we obtain
the corresponding P-values from these permutation tests.
Calculate node scores and edge scores
To illustrate our approach, we begin by simulating a network of 40
nodes. We generate P values for all nodes and potential connecting edges
using the uniform.beta.node.edge.score()
function. The
output of this function is a list consisting of node scores
(NodeScore
), edge scores (EdgeScore
), and the
network data in the form of an igraph object (Network
).
# simulate the p values for all the possible edges in the network
ind.pos.pval.edge<-rbinom(40*39/2,1,0.5)
pval.edge<-(1-ind.pos.pval.edge)*runif(40*39/2)+ind.pos.pval.edge*rbeta(40*39/2,0.1,1)
names(pval.edge)<-unlist(sapply(1:39,function(i){sapply((i+1):40, function(j){paste(paste("Var",
as.character(i),sep=""),paste("Var",as.character(j),sep=""),sep="_")})}))
# simulate p values for all the nodes in the network
ind.pos.pval.node<-rbinom(40,1,0.2)
pval.node<-(1-ind.pos.pval.node)*runif(40)+ind.pos.pval.node*rbeta(40,0.1,1)
names(pval.node)<-paste("Var", as.character(1:40),sep="")
# generate the node score-NodeScore, edge scores-EdgeScore and igraph object-Network
network.test<-uniform.beta.node.edge.score(pval.node,pval.edge,0.05,0.05,dat1)
#igraph object-Network
network.test$Network
## IGRAPH 00d8c0a UN-- 40 780 --
## + attr: name (v/c), score (v/n), score (e/n)
## + edges from 00d8c0a (vertex names):
## [1] Var1--Var2 Var1--Var3 Var1--Var4 Var1--Var5 Var1--Var6 Var1--Var7
## [7] Var1--Var8 Var1--Var9 Var1--Var10 Var1--Var11 Var1--Var12 Var1--Var13
## [13] Var1--Var14 Var1--Var15 Var1--Var16 Var1--Var17 Var1--Var18 Var1--Var19
## [19] Var1--Var20 Var1--Var21 Var1--Var22 Var1--Var23 Var1--Var24 Var1--Var25
## [25] Var1--Var26 Var1--Var27 Var1--Var28 Var1--Var29 Var1--Var30 Var1--Var31
## [31] Var1--Var32 Var1--Var33 Var1--Var34 Var1--Var35 Var1--Var36 Var1--Var37
## [37] Var1--Var38 Var1--Var39 Var1--Var40 Var2--Var3 Var2--Var4 Var2--Var5
## [43] Var2--Var6 Var2--Var7 Var2--Var8 Var2--Var9
## + ... omitted several edges
## Var1 Var10 Var11 Var12 Var13 Var14 Var15
## -5.0385125 24.7623340 -4.2233614 -4.9367917 -3.6975756 -4.5012564 -4.7563504
## Var16 Var17 Var18 Var19 Var2 Var20 Var21
## -4.0878448 -3.1971996 -4.6135340 -4.8023226 -3.6698740 -4.2590304 -4.7897439
## Var22 Var23 Var24 Var25 Var26 Var27 Var28
## -4.4841002 -3.6622539 -3.1075883 -4.2561739 -4.3776464 -3.8730274 -5.0174061
## Var29 Var3 Var30 Var31 Var32 Var33 Var34
## -4.9647419 -4.2040447 -4.8805226 -3.5676581 -3.3366449 -4.7895373 -3.0806665
## Var35 Var36 Var37 Var38 Var39 Var4 Var40
## -3.2053499 -4.3191783 16.7121589 -4.9735847 4.9885309 -3.9865984 -4.4997289
## Var5 Var6 Var7 Var8 Var9
## -1.8135448 -4.1830772 -3.8703460 -4.5694404 -0.2631627
Detect optimized subnetworks
To detect active subnetworks within our network, we illustrate the
approach by Dittrich et al. in 2008 and compare to the
wActNet
algorithm. The algorithm proposed by Dittrich et
al. in 2008 uses node scores only. The wActNet
algorithm
combines node scores and edge scores using the
MultiModuleFind
function. Within the
MultiModuleFind
function, we have the flexibility to set
the maximum number of active subnetworks (ncluster
) and
choose between two methods: the one presented by Dittrich et al. in 2008
(method=NodeOnly
) or the wActNet approach
(method=NodeEdge
).
Furthermore, the wActNet algorithm allows us to fine-tune the
optimization of active subnetworks by adjusting the parameter
weightratio.edge.node
. This parameter represents the
weighted ratio between edge scores and node scores in the objective
function.
In summary, by combining the methodology introduced by Dittrich et
al. (2008) with the wActNet
extension that includes both
node and edge scores, we can effectively detect the optimized active
subnetworks on network data.
Please note when the function completes execution and fails to identify any further optimized or active subnetworks, a warning message “No positive nodes” will be generated.
network.test1<-network.test$Network
node.scores<-network.test$NodeScore
edge.scores<-network.test$EdgeScore
#Note: both BioNet and wActNet R packages stop running and can't detect any optimized/active subnetwork further and complete, a warning output message-"No positive nodes" will be generated.
#identify all possible optimized subnetworks in simulated network
# using Dittrich's method using node scores only
multi.mod.n<-MultiModuleFind(network.test1,node.scores,edge.scores,
weightratio.edge.node=1,ncluster=3,method="NodeOnly")
## Warning in runFastHeinz(network.after.cluster1, node.scores2): No positive
## nodes
#all possible optimized subnetworks in simulated network using Dittrich's method using node scores only
multi.mod.n
## [[1]]
## IGRAPH 00e41fb UN-- 3 3 --
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 00e41fb (vertex names):
## [1] Var10--Var37 Var10--Var39 Var37--Var39
#visualize the final illustrated detected optimized subnetwork by BioNet algorithm
#for edges, edge score >=0, solid line, edge score < 0, dash line
#for nodes, node score >=0, red color, edge score < 0, grey color
#the size/length of node or edges proportional to node or edge score respectively
#need R package-visNetwork, if not, please install from CRAN
require(visNetwork, quietly = TRUE)
dat_node<-multi.mod.n[[1]]
E(dat_node)$dashes<-ifelse(E(dat_node)$score>0,FALSE,FALSE)
E(dat_node)$width<-(E(dat_node)$score+1.5)*0.3
V(dat_node)$color<-ifelse(V(dat_node)$score>0,"#E7298A","grey")
V(dat_node)$size<-(V(dat_node)$score+11)*0.5
E(dat_node)$color<-ifelse(E(dat_node)$score>0,"#66A61E","grey")
#########################################################################################
# setwd("/Users/yubingyao/Google Drive/Network analysis/R code/")
# save(GU2,file="network2_2clusters_plot.RData")
#load("network_2clusters_plot.RData")
data_node<- toVisNetworkData(dat_node)
visNetwork(nodes = data_node$nodes, edges = data_node$edges, height = "800px")%>% visNodes(font=list(size=20))
# identify all possible signaling modules in simulated network
# using our proposed method based on node scores and edge scores
multi.mod.e<-MultiModuleFind(network.test1,node.scores,edge.scores,
weightratio.edge.node=1,ncluster=3,method="NodeEdge")
#all possible optimized subnetworks in simulated network using our proposed method based on node scores and edge scores
multi.mod.e
## [[1]]
## IGRAPH 010e1b7 UN-- 10 45 --
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 010e1b7 (vertex names):
## [1] Var9 --Var2 Var10--Var2 Var17--Var2 Var18--Var2 Var2 --Var21
## [6] Var23--Var2 Var24--Var2 Var2 --Var34 Var38--Var2 Var10--Var9
## [11] Var17--Var9 Var18--Var9 Var9 --Var21 Var23--Var9 Var24--Var9
## [16] Var9 --Var34 Var38--Var9 Var10--Var17 Var10--Var18 Var10--Var21
## [21] Var10--Var23 Var10--Var24 Var10--Var34 Var10--Var38 Var17--Var18
## [26] Var17--Var21 Var23--Var17 Var24--Var17 Var17--Var34 Var17--Var38
## [31] Var18--Var21 Var23--Var18 Var24--Var18 Var18--Var34 Var18--Var38
## [36] Var23--Var21 Var24--Var21 Var21--Var34 Var38--Var21 Var24--Var23
## + ... omitted several edges
##
## [[2]]
## IGRAPH 0121c96 UN-- 4 6 --
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 0121c96 (vertex names):
## [1] Var13--Var6 Var6 --Var28 Var37--Var6 Var13--Var28 Var37--Var13
## [6] Var37--Var28
##
## [[3]]
## IGRAPH 012716a UN-- 10 45 --
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 012716a (vertex names):
## [1] Var4 --Var1 Var12--Var1 Var16--Var1 Var20--Var1 Var1 --Var22
## [6] Var1 --Var27 Var1 --Var32 Var33--Var1 Var39--Var1 Var4 --Var12
## [11] Var16--Var4 Var4 --Var20 Var4 --Var22 Var4 --Var27 Var4 --Var32
## [16] Var33--Var4 Var39--Var4 Var16--Var12 Var20--Var12 Var12--Var22
## [21] Var12--Var27 Var12--Var32 Var33--Var12 Var39--Var12 Var16--Var20
## [26] Var16--Var22 Var16--Var27 Var16--Var32 Var16--Var33 Var39--Var16
## [31] Var20--Var22 Var20--Var27 Var20--Var32 Var33--Var20 Var39--Var20
## [36] Var27--Var22 Var32--Var22 Var33--Var22 Var39--Var22 Var32--Var27
## + ... omitted several edges
#visualize the final illustrated detected optimized subnetwork by our proposed algorithm
#for edges, edge score >=0, solid line, edge score < 0, dash line
#for nodes, node score >=0, red color, edge score < 0, grey color
#the size/length of node or edges proportional to node or edge score respectively
#need R package-visNetwork, if not, please install from CRAN
#require(visNetwork, quietly = TRUE)
dat_nodeedge<-multi.mod.e[[1]]
E(dat_nodeedge)$dashes<-ifelse(E(dat_nodeedge)$score>0,FALSE,FALSE)
E(dat_nodeedge)$width<-(E(dat_nodeedge)$score+1.5)*0.3
V(dat_nodeedge)$color<-ifelse(V(dat_nodeedge)$score>0,"#E7298A","grey")
V(dat_nodeedge)$size<-(V(dat_nodeedge)$score+11)*0.5
E(dat_nodeedge)$color<-ifelse(E(dat_nodeedge)$score>0,"#66A61E","grey")
#########################################################################################
# setwd("/Users/yubingyao/Google Drive/Network analysis/R code/")
# save(GU2,file="network2_2clusters_plot.RData")
#load("network_2clusters_plot.RData")
data_nodeedge <- toVisNetworkData(dat_nodeedge)
visNetwork(nodes = data_nodeedge$nodes, edges = data_nodeedge$edges, height = "800px")%>% visNodes(font=list(size=20))
Application on the dataset of ovarian cancer
This uses a set of public ovarian cancer data in the curatedOvarianData R package. See “??curatedOvarianData” in the RStudio console for more help with this dataset: “This package represents a manually curated data collection for gene expression meta-analysis of patients with ovarian cancer. This resource provides uniformly prepared microarray data with curated and documented clinical metadata. It allows a computational user to efficiently curatedOvarianData identify studies and patient subgroups of interest for analysis and to run such analyses immediately without the challenges posed by harmonizing heterogeneous microarray technologies, study designs, expression data processing methods, and clinical data formats.”
As with the simulation above, begin by installing and loading the devtools package. Use the install_github function to install wActNet from GitHub as shown below. Our package depends on two additional R packages: igraph and BioNet. Additionally, install the curatedOvarianData package containing the data to be used in this application example.
Because this is full-transcriptome data, we will load the data and select only a subset of it.
Install and load packages for downloading and preprocessing of data and for running wActNet
start.time <- Sys.time()
library(devtools)
# Uncomment these lines to install the affy and curatedOvarianData package
# Takes time, so re-comment after
# if (!require("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
# BiocManager::install(version = "3.19")
# library(BiocManager)
# BiocManager::install("affy")
# BiocManager::install("curatedOvarianData")
#library(Biobase)
#install the newest version of wActNet package
#install_github("yuy113/wActNet", force = TRUE)
library(igraph)
library(affy)
#library(BioNet)
library(dplyr)
library(stringr)
#library(wActNet)
library(curatedOvarianData)
Load and select a subset of the ovarian cancer data.
standardize = function(x){return((x-mean(x))/sd(x))}
data(GSE32062.GPL6480_eset)
lateStage = exprs(GSE32062.GPL6480_eset)
For a straightforward example on a small set of genes, use the chunk below:
# Extract a subset of genes related to ovarian carcinoma
# based on the Human Phenotype Ontology
# See https://hpo.jax.org/app/browse/term/HP:0025318
library(tidyr)
library(tidyverse)
lateStageSmall = lateStage[c(1680,1681,3027,4564,8930,12243,12245,13694,13695,13701,13979,16082,16875,17980),] %>%
t() %>%
data.frame() %>%
apply(.,2,standardize)
Use the code chunk below to instead randomly select a larger group of genes. Note that wActNet does not handle gene names with certain symbols, for example “///” in the ovarian cancer dataset. Additionally, underscores will not work because this symbol is used later in order to connect two node names. We provide an example workaround below:
# select a random subset of genes to test different sizes
setSize = 500
set.seed(1989)
# remove genes with weird symbols in the names as this messes up wActNet
row.names(lateStage) <- gsub("///", "-", row.names(lateStage))
geneIdx = sample(1:nrow(lateStage), size=setSize,replace=F)
lateStageSmall = lateStage[geneIdx,] %>%
t() %>%
data.frame() %>%
apply(.,2,standardize)
names(lateStageSmall) = colnames(lateStageSmall)
#head(lateStageSmall)
Assigning edge p-values
To generate edge p-values, we use a permutation test on a correlation
network. You can vary nsim
to make this easier to test,
although in practice you’ll want to use a high number.
Assigning node p-values
To assign node p-values associated with each covariate, we will regress the expression of each gene against summary grade. The variable summarygrade is a binarized form of the grade of cancer.
lateStagePheno = phenoData(GSE32062.GPL6480_eset)
gradeInfo = data.frame("sample"=row.names(pData(lateStagePheno)),
"summarygrade"=pData(lateStagePheno)$summarygrade)
regressionDataset = lateStageSmall %>%
data.frame() %>%
mutate("sample"=row.names(lateStageSmall)) %>%
inner_join(gradeInfo,by="sample")
To get node weights, we regress gene expression against summary grade and take the resulting p-values. Note that we take the raw p-values and not the FDR, as the conversion to node scores in wActNet provides FDR control.
##
## Attaching package: 'limma'
## The following object is masked from 'package:BiocGenerics':
##
## plotMA
design_matrix = model.matrix( ~ factor(summarygrade),data=regressionDataset)
limma_res = regressionDataset %>% select(-c(sample,summarygrade)) %>% t() %>%
lmFit(object = ., design = design_matrix) %>%
eBayes()
node_table = limma_res %>%
topTable(number = Inf) %>%
data.frame()
## Removing intercept from test coefficients
Running wActNet
Converting p-values to node scores
The uniform.beta.node.edge.score
function is provided in
wActNet
for converting the list of p-values to scores, in
which a higher score corresponds to a lower p-value and vice-versa. FDR
control is implemented such that negative scores correspond to p-values
above the FDR threshold and positive scores correspond to p-values that
are FDR-significant (i.e., at or below the FDR threshold).
Please note: “Warning: One or both parameters are on the limit of the defined parameter space” indicates that either too many or too few nodes have statistically significant p-values (FDR-adjusted p > 0.05). This suggests that rather than the null uniform-beta mixture distribution, the actual distribution is too close to either the uniform or the beta distribution alone. The user should expand the grid of possible values to be closer to the boundary, using the optim.method, optim.lower, optim.upper, optim.control, and optim.hessian parameters of the uniform.beta.node.edge.score() function.
FDR.node=0.1
FDR.edge=0.05
networkPrep = uniform.beta.node.edge.score(pval.node = node_p_vals, pval.edge = edge_p_vals, FDR.node = FDR.node, FDR.edge = FDR.edge, dat = lateStageSmall, optim.method="Nelder-Mead", optim.lower = -Inf, optim.upper = Inf, optim.control = list(), optim.hessian = FALSE)
hist(node_p_vals, breaks = 50)
Running module detection
The MultiModuleFind
function is used to find active
subnetworks. It is an iterative algorithm that finds the largest scoring
subnetwork, then removes it from the network and repeats the process.
The number of modules ncluster
must be prespecified. As
above, please note that “Warning: No positive nodes” indicates that the
function completed algorithm execution before reaching
ncluster
and failed to identify any further optimized or
active subnetworks. Here, as with the simulated data above, we compare
our proposed method based on both node scores and edge scores with
Dittrich’s method using node scores only.
#all possible optimized subnetworks in simulated network using Dittrich's method using node scores only
modules.n = MultiModuleFind(network = networkPrep$Network,
node.scores = networkPrep$NodeScore,
edge.scores = networkPrep$EdgeScore,
ncluster = 2,
method = "NodeOnly")
## Warning in runFastHeinz(network.after.cluster1, node.scores2): No positive
## nodes
#all possible optimized subnetworks in simulated network using our proposed method based on node scores and edge scores
modules.ne = MultiModuleFind(network = networkPrep$Network,
node.scores = networkPrep$NodeScore,
edge.scores = networkPrep$EdgeScore,
ncluster = 2,
method = "NodeEdge")
## Warning in runFastHeinz.e(network.after.cluster1, node.scores2, edge.scores2, :
## No positive nodes
Examine results
## [[1]]
## IGRAPH 8c29a26 UN-- 11 55 --
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 8c29a26 (vertex names):
## [1] SIRPB1 --STAT1 IGKV1OR2.108--PHF21B CD163 --PODXL
## [4] ARFRP1 --SIRPB1 ARFRP1 --STAT1 DCXR --SIRPB1
## [7] DCXR --STAT1 SIRPB1 --GPR114 STAT1 --GPR114
## [10] IGKV1OR2.108--SIRPB1 IGKV1OR2.108--STAT1 PHF21B --SIRPB1
## [13] PHF21B --STAT1 SIRPB1 --TNFRSF17 STAT1 --TNFRSF17
## [16] SIRPB1 --TMEM140 STAT1 --TMEM140 PODXL --SIRPB1
## [19] PODXL --STAT1 CD163 --SIRPB1 CD163 --STAT1
## [22] ARFRP1 --DCXR ARFRP1 --GPR114
## + ... omitted several edges
Note that if a null result where the number of nodes and edges are both zero is returned, you may need to either increase the sample size of genes being used or relax the FDR in order to find active subnetworks. Increasing the sample size of genes will give more possible combinations of subnetwork candidates, and relaxing the FDR will allow subnetworks that are less significant to be detected.
This is a list of edges that shows the connected vertices in the active subnetwork that was detected. Note that the number of nodes is 46, and the number of edges is \(46 \choose 2\) = 1035, the maximum number for a network with 46 nodes.
Note that if more than one active subnetwork is detected, that second network’s information can be accessed by replacing the “[[1]]” below with “[[2]]”, and so on for successive subnetworks.
## [[1]]
## IGRAPH 261ba58 UN-- 10 45 --
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 261ba58 (vertex names):
## [1] ARFRP1 --CD163 ARFRP1 --DCXR
## [3] CD163 --DCXR ARFRP1 --IGKV1OR2.108
## [5] CD163 --IGKV1OR2.108 DCXR --IGKV1OR2.108
## [7] ARFRP1 --PHF21B CD163 --PHF21B
## [9] DCXR --PHF21B IGKV1OR2.108--PHF21B
## [11] ARFRP1 --PODXL CD163 --PODXL
## [13] DCXR --PODXL IGKV1OR2.108--PODXL
## [15] PHF21B --PODXL ARFRP1 --SIRPB1
## + ... omitted several edges
#list of active subnetworks detected using our proposed method (node scores and edge scores)
modules.ne
## [[1]]
## IGRAPH 8c29a26 UN-- 11 55 --
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 8c29a26 (vertex names):
## [1] SIRPB1 --STAT1 IGKV1OR2.108--PHF21B CD163 --PODXL
## [4] ARFRP1 --SIRPB1 ARFRP1 --STAT1 DCXR --SIRPB1
## [7] DCXR --STAT1 SIRPB1 --GPR114 STAT1 --GPR114
## [10] IGKV1OR2.108--SIRPB1 IGKV1OR2.108--STAT1 PHF21B --SIRPB1
## [13] PHF21B --STAT1 SIRPB1 --TNFRSF17 STAT1 --TNFRSF17
## [16] SIRPB1 --TMEM140 STAT1 --TMEM140 PODXL --SIRPB1
## [19] PODXL --STAT1 CD163 --SIRPB1 CD163 --STAT1
## [22] ARFRP1 --DCXR ARFRP1 --GPR114
## + ... omitted several edges
Explore network information:
## [1] 0.32868358 0.53678782 0.01308877 2.64488587 2.29957275 1.64071619
## [7] 1.59621002 0.71437065 1.23938062 1.65049686 -1.05252907
## [1] 0.1500247 0.1500247 -0.6897466 -2.1920527 -1.7050822 -1.7876402
## [7] -2.0438967 0.1500247 0.1500247 0.1500247 0.1500247 -0.2116451
## [13] 0.1500247 0.1500247 0.1500247 0.1500247 0.1500247 -0.2116451
## [19] -0.2116451 0.1500247 0.1500247 0.1500247 0.1500247 -2.1978826
## [25] -0.6897466 -2.0438967 -1.8698877 -1.5886505 -2.2149917 0.1500247
## [31] -1.5295179 -1.7615458 -1.1465480 0.1500247 0.1500247 0.1500247
## [37] 0.1500247 0.1500247 0.1500247 0.1500247 -0.2116451 0.1500247
## [43] 0.1500247 0.1500247 0.1500247 0.1500247 -0.2116451 -0.2116451
## [49] 0.1500247 0.1500247 0.1500247 0.1500247 0.1500247 -1.1465480
## [55] 0.1500247
## ARFRP1 CD163 DCXR IGKV1OR2.108 PHF21B PODXL
## 10 10 10 10 10 10
## SIRPB1 STAT1 TMEM140 TNFRSF17 GPR114
## 10 10 10 10 10
Visualize Network Results as above
library(visNetwork)
multi.mod.n <- modules.n
dat_node<-multi.mod.n[[1]]
E(dat_node)$dashes<-ifelse(E(dat_node)$score>0,FALSE,FALSE)
E(dat_node)$width<-(E(dat_node)$score+1.5)*0.3
V(dat_node)$color<-ifelse(V(dat_node)$score>0,"#E7298A","grey")
V(dat_node)$size<-(V(dat_node)$score+11)*0.5
E(dat_node)$color<-ifelse(E(dat_node)$score>0,"#66A61E","grey")
data_node<- toVisNetworkData(dat_node)
visNetwork(nodes = data_node$nodes, edges = data_node$edges, height = "800px")%>% visNodes(font=list(size=20))
multi.mod.ne <- modules.ne
dat_nodeedge<-multi.mod.ne[[1]]
E(dat_nodeedge)$dashes<-ifelse(E(dat_nodeedge)$score>0,FALSE,FALSE)
E(dat_nodeedge)$width<-(E(dat_nodeedge)$score+1.5)*0.3
V(dat_nodeedge)$color<-ifelse(V(dat_nodeedge)$score>0,"#E7298A","grey")
V(dat_nodeedge)$size<-(V(dat_nodeedge)$score+11)*0.5
E(dat_nodeedge)$color<-ifelse(E(dat_nodeedge)$score>0,"#66A61E","grey")
data_nodeedge <- toVisNetworkData(dat_nodeedge)
visNetwork(nodes = data_nodeedge$nodes, edges = data_nodeedge$edges, height = "800px")%>% visNodes(font=list(size=20))
Runtime dependence on FDR and Number of Genes
## [1] "FDR node: " "0.1"
## [1] "FDR edge: " "0.05"
## [1] "Number of Genes: " "500"
## [1] "Runtime: " "3.96293711662292"
##
## "Number of Nodes in Active Subnetwork: "
## Length
## "11"
Selected runtime examples from testing
The two plots below show the impact of the number of genes and the selected FDR values on the runtime. The first plot highlights that the runtime of specific wActNet functions increases as the user increases the number of input genes.
library(tidyverse)
library(ggplot2)
NumGenes <- c(100, 300, 500, 1000)
pval.perm.corr <- c(1, 4, 12, 42)
MultiModuleFind <- c(.1, 2, 5, 196) #minutes
runtime_dat <- as.data.frame(cbind(NumGenes, pval.perm.corr, MultiModuleFind))
runtime_dat %>% pivot_longer(2:3, names_to = "Function") %>% ggplot() + geom_point(mapping = aes(x=NumGenes, y = value, color = Function, size = 3), show.legend = c(color=T,size=F)) + theme(plot.caption = element_text(hjust = 0)) + labs(title = "Runtime vs. Number of Genes for Two wActNet Functions", x = "Number of Genes", y = "Runtime in Minutes", caption = str_wrap("This plot highlights how the runtime of specific wActNet functions increases as the user increases the number of input genes. The `MultiModuleFind` function is an iterative algorithm used to find active subnetworks. The `pval.perm.corr` function performs a permutation test on a correlation network to generate edge p-values.", 100))
FDR <- c(0.05, 0.1, 0.5)
runtimes100genes <- c(36/60, 36/60, 36/60) #seconds/(60s/min) = min
runtimes500genes <- c(15, 21, 48) #minutes
runtime_dat <- as.data.frame(cbind(FDR, runtimes100genes, runtimes500genes))
runtime_dat %>% pivot_longer(2:3, names_to = "Number of Genes") %>% ggplot() + geom_point(mapping = aes(x=FDR, y = value, color = `Number of Genes`, size = 3), show.legend = c(color=T,size=F)) + theme(plot.caption = element_text(hjust = 0)) + labs(title = "Runtime vs. FDR for full script", x = "FDR", y = "Runtime in Minutes", caption = str_wrap("The second plot shows runtimes of a full script, including Dittrich's method and our proposed method, as well as visualizations for both of their respective network outputs. This plot highlights that relaxing the FDR (which allows more nodes to pass the significance threshold, and therefore creates more complex networks) results in longer runtimes.", 100))
Contact Us / Contribute
This package is new, and any and all suggestions are welcomed. You can use GitHub to raise issues, contribute, or communicate with us about the package:
References
Dittrich, M., Klau, G., Rosenwald, A., Dandekar, T., Müller, T.: Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics 24(13), 223–231 (2008)
Beisser, D., Klau, G., Dandekar, T., Müller, T., Dittrich, M.: Bionet: an r-package for the functional analysis of biological networks. Bioinformatics 26(8), 1129–1130 (2010)
Ganzfried BF, Riester M, Haibe-Kains B, Risch T, Tyekucheva S, Jazic I, Wang XV, Ahmadifar M, Birrer M, Parmigiani G, Huttenhower C, Waldron L: curatedOvarianData: Clinically Annotated Data for the Ovarian Cancer Transcriptome. Database, Volume 2013. doi:10.1093/database/bat013, https://academic.oup.com/database/article/doi/10.1093/database/bat013/330978