Active subnetwork detection Quick Start

Install and Load the `wActNet` Package

Begin by installing and loading the devtools package. Use the install_github function to install wActNet from GitHub as shown below. Our package depends on two additional R packages, igraph and BioNet, and the second one can be downloaded from BioConductor https://www.bioconductor.org/install/.

#please comment the below code out if you already install the package-wActNet from github
#library(devtools)
#if (!requireNamespace("BiocManager", quietly = TRUE))
#  install.packages("BiocManager")
#BiocManager::install("pathview")
#install_github("yuy113/wActNet")
#BiocManager::install(c("BioNet"))
#foreign, MASS, nlme, survival
library(wActNet)
#R package BioNet from BioConductor 
library(BioNet)

## Loading required package: graph

## Loading required package: BiocGenerics

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
##     tapply, union, unique, unsplit, which.max, which.min

## Loading required package: RBGL

library(igraph)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:RBGL':
## 
##     bfs, dfs, transitivity

## The following objects are masked from 'package:graph':
## 
##     degree, edges, intersection, union

## The following objects are masked from 'package:BiocGenerics':
## 
##     normalize, path, union

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

Simulated dataset

Simulate data and generate P values of tests on pairwise correlations

To illustrate the process, we simulate data (\(n\) observations, \(p\) features) from a multivariate normal distribution, which is then used to generate data on \({p}\choose{2}\) pairwise correlations data through the simulation of random data, which follows a multivariate normal distribution. Subsequently, we use the function pval.perm.corr() to conduct multiple nonparametric permutation tests for all the pairwise correlations within the simulated data. Using the function and obtaining corresponding P values, we obtain the corresponding P-values from these permutation tests.

dat1<-matrix(rnorm(4000),ncol=40,nrow=100)
colnames(dat1)<-paste("Var", as.character(1:40),sep="")
pval.edge<-pval.perm.corr(dat1,nsim=100,MatchId=NULL,do.parallel=FALSE)
#pval.edge

Calculate node scores and edge scores

To illustrate our approach, we begin by simulating a network of 40 nodes. We generate P values for all nodes and potential connecting edges using the uniform.beta.node.edge.score() function. The output of this function is a list consisting of node scores (NodeScore), edge scores (EdgeScore), and the network data in the form of an igraph object (Network).

# simulate the p values for all the possible edges in the network
ind.pos.pval.edge<-rbinom(40*39/2,1,0.5)
pval.edge<-(1-ind.pos.pval.edge)*runif(40*39/2)+ind.pos.pval.edge*rbeta(40*39/2,0.1,1)
names(pval.edge)<-unlist(sapply(1:39,function(i){sapply((i+1):40, function(j){paste(paste("Var",
as.character(i),sep=""),paste("Var",as.character(j),sep=""),sep="_")})}))

# simulate p values for all the nodes in the network
ind.pos.pval.node<-rbinom(40,1,0.2)
pval.node<-(1-ind.pos.pval.node)*runif(40)+ind.pos.pval.node*rbeta(40,0.1,1)
names(pval.node)<-paste("Var", as.character(1:40),sep="")

# generate the node score-NodeScore, edge scores-EdgeScore and igraph object-Network
network.test<-uniform.beta.node.edge.score(pval.node,pval.edge,0.05,0.05,dat1)

#igraph object-Network
network.test$Network

## IGRAPH 00d8c0a UN-- 40 780 -- 
## + attr: name (v/c), score (v/n), score (e/n)
## + edges from 00d8c0a (vertex names):
##  [1] Var1--Var2  Var1--Var3  Var1--Var4  Var1--Var5  Var1--Var6  Var1--Var7 
##  [7] Var1--Var8  Var1--Var9  Var1--Var10 Var1--Var11 Var1--Var12 Var1--Var13
## [13] Var1--Var14 Var1--Var15 Var1--Var16 Var1--Var17 Var1--Var18 Var1--Var19
## [19] Var1--Var20 Var1--Var21 Var1--Var22 Var1--Var23 Var1--Var24 Var1--Var25
## [25] Var1--Var26 Var1--Var27 Var1--Var28 Var1--Var29 Var1--Var30 Var1--Var31
## [31] Var1--Var32 Var1--Var33 Var1--Var34 Var1--Var35 Var1--Var36 Var1--Var37
## [37] Var1--Var38 Var1--Var39 Var1--Var40 Var2--Var3  Var2--Var4  Var2--Var5 
## [43] Var2--Var6  Var2--Var7  Var2--Var8  Var2--Var9 
## + ... omitted several edges

#node scores
network.test$NodeScore

##       Var1      Var10      Var11      Var12      Var13      Var14      Var15 
## -5.0385125 24.7623340 -4.2233614 -4.9367917 -3.6975756 -4.5012564 -4.7563504 
##      Var16      Var17      Var18      Var19       Var2      Var20      Var21 
## -4.0878448 -3.1971996 -4.6135340 -4.8023226 -3.6698740 -4.2590304 -4.7897439 
##      Var22      Var23      Var24      Var25      Var26      Var27      Var28 
## -4.4841002 -3.6622539 -3.1075883 -4.2561739 -4.3776464 -3.8730274 -5.0174061 
##      Var29       Var3      Var30      Var31      Var32      Var33      Var34 
## -4.9647419 -4.2040447 -4.8805226 -3.5676581 -3.3366449 -4.7895373 -3.0806665 
##      Var35      Var36      Var37      Var38      Var39       Var4      Var40 
## -3.2053499 -4.3191783 16.7121589 -4.9735847  4.9885309 -3.9865984 -4.4997289 
##       Var5       Var6       Var7       Var8       Var9 
## -1.8135448 -4.1830772 -3.8703460 -4.5694404 -0.2631627

#edge scores
#network.test$EdgeScore

Detect optimized subnetworks

To detect active subnetworks within our network, we illustrate the approach by Dittrich et al. in 2008 and compare to the wActNet algorithm. The algorithm proposed by Dittrich et al. in 2008 uses node scores only. The wActNet algorithm combines node scores and edge scores using the MultiModuleFind function. Within the MultiModuleFind function, we have the flexibility to set the maximum number of active subnetworks (ncluster) and choose between two methods: the one presented by Dittrich et al. in 2008 (method=NodeOnly) or the wActNet approach (method=NodeEdge).

Furthermore, the wActNet algorithm allows us to fine-tune the optimization of active subnetworks by adjusting the parameter weightratio.edge.node. This parameter represents the weighted ratio between edge scores and node scores in the objective function.

In summary, by combining the methodology introduced by Dittrich et al. (2008) with the wActNet extension that includes both node and edge scores, we can effectively detect the optimized active subnetworks on network data.

Please note when the function completes execution and fails to identify any further optimized or active subnetworks, a warning message “No positive nodes” will be generated.

network.test1<-network.test$Network
node.scores<-network.test$NodeScore
edge.scores<-network.test$EdgeScore

#Note: both BioNet and wActNet R packages stop running and can't detect any optimized/active subnetwork further and complete, a warning output message-"No positive nodes" will be generated.
#identify all possible optimized subnetworks in simulated network
# using Dittrich's method using node scores only
multi.mod.n<-MultiModuleFind(network.test1,node.scores,edge.scores,
weightratio.edge.node=1,ncluster=3,method="NodeOnly")

## Warning in runFastHeinz(network.after.cluster1, node.scores2): No positive
## nodes

#all possible optimized subnetworks in simulated network using Dittrich's method using node scores only
multi.mod.n

## [[1]]
## IGRAPH 00e41fb UN-- 3 3 -- 
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 00e41fb (vertex names):
## [1] Var10--Var37 Var10--Var39 Var37--Var39

#visualize the final illustrated detected optimized subnetwork by BioNet algorithm
#for edges, edge score >=0, solid line, edge score < 0, dash line
#for nodes, node score >=0, red color, edge score < 0, grey color 
#the size/length of node or edges proportional to node or edge score respectively
#need R package-visNetwork, if not, please install from CRAN
require(visNetwork, quietly = TRUE)
dat_node<-multi.mod.n[[1]]
 E(dat_node)$dashes<-ifelse(E(dat_node)$score>0,FALSE,FALSE)
  E(dat_node)$width<-(E(dat_node)$score+1.5)*0.3
  V(dat_node)$color<-ifelse(V(dat_node)$score>0,"#E7298A","grey")
  V(dat_node)$size<-(V(dat_node)$score+11)*0.5
  
  E(dat_node)$color<-ifelse(E(dat_node)$score>0,"#66A61E","grey")
  
  
 #########################################################################################
  # setwd("/Users/yubingyao/Google Drive/Network analysis/R code/")
  
 # save(GU2,file="network2_2clusters_plot.RData")
  
  
  
  #load("network_2clusters_plot.RData")
  
  data_node<- toVisNetworkData(dat_node)
  
  visNetwork(nodes = data_node$nodes, edges = data_node$edges, height = "800px")%>% visNodes(font=list(size=20))

# identify all possible signaling modules in simulated network
# using our proposed method based on node scores and edge scores
multi.mod.e<-MultiModuleFind(network.test1,node.scores,edge.scores,
weightratio.edge.node=1,ncluster=3,method="NodeEdge")
#all possible optimized subnetworks in simulated network using our proposed method based on node scores and edge scores
multi.mod.e

## [[1]]
## IGRAPH 010e1b7 UN-- 10 45 -- 
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 010e1b7 (vertex names):
##  [1] Var9 --Var2  Var10--Var2  Var17--Var2  Var18--Var2  Var2 --Var21
##  [6] Var23--Var2  Var24--Var2  Var2 --Var34 Var38--Var2  Var10--Var9 
## [11] Var17--Var9  Var18--Var9  Var9 --Var21 Var23--Var9  Var24--Var9 
## [16] Var9 --Var34 Var38--Var9  Var10--Var17 Var10--Var18 Var10--Var21
## [21] Var10--Var23 Var10--Var24 Var10--Var34 Var10--Var38 Var17--Var18
## [26] Var17--Var21 Var23--Var17 Var24--Var17 Var17--Var34 Var17--Var38
## [31] Var18--Var21 Var23--Var18 Var24--Var18 Var18--Var34 Var18--Var38
## [36] Var23--Var21 Var24--Var21 Var21--Var34 Var38--Var21 Var24--Var23
## + ... omitted several edges
## 
## [[2]]
## IGRAPH 0121c96 UN-- 4 6 -- 
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 0121c96 (vertex names):
## [1] Var13--Var6  Var6 --Var28 Var37--Var6  Var13--Var28 Var37--Var13
## [6] Var37--Var28
## 
## [[3]]
## IGRAPH 012716a UN-- 10 45 -- 
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 012716a (vertex names):
##  [1] Var4 --Var1  Var12--Var1  Var16--Var1  Var20--Var1  Var1 --Var22
##  [6] Var1 --Var27 Var1 --Var32 Var33--Var1  Var39--Var1  Var4 --Var12
## [11] Var16--Var4  Var4 --Var20 Var4 --Var22 Var4 --Var27 Var4 --Var32
## [16] Var33--Var4  Var39--Var4  Var16--Var12 Var20--Var12 Var12--Var22
## [21] Var12--Var27 Var12--Var32 Var33--Var12 Var39--Var12 Var16--Var20
## [26] Var16--Var22 Var16--Var27 Var16--Var32 Var16--Var33 Var39--Var16
## [31] Var20--Var22 Var20--Var27 Var20--Var32 Var33--Var20 Var39--Var20
## [36] Var27--Var22 Var32--Var22 Var33--Var22 Var39--Var22 Var32--Var27
## + ... omitted several edges

#visualize the final illustrated detected optimized subnetwork by our proposed algorithm
#for edges, edge score >=0, solid line, edge score < 0, dash line
#for nodes, node score >=0, red color, edge score < 0, grey color 
#the size/length of node or edges proportional to node or edge score respectively
#need R package-visNetwork, if not, please install from CRAN
#require(visNetwork, quietly = TRUE)
dat_nodeedge<-multi.mod.e[[1]]
 E(dat_nodeedge)$dashes<-ifelse(E(dat_nodeedge)$score>0,FALSE,FALSE)
  E(dat_nodeedge)$width<-(E(dat_nodeedge)$score+1.5)*0.3
  V(dat_nodeedge)$color<-ifelse(V(dat_nodeedge)$score>0,"#E7298A","grey")
  V(dat_nodeedge)$size<-(V(dat_nodeedge)$score+11)*0.5
  
  E(dat_nodeedge)$color<-ifelse(E(dat_nodeedge)$score>0,"#66A61E","grey")
  
  
 #########################################################################################
  # setwd("/Users/yubingyao/Google Drive/Network analysis/R code/")
  
 # save(GU2,file="network2_2clusters_plot.RData")
  
  
  
  #load("network_2clusters_plot.RData")
  
  data_nodeedge <- toVisNetworkData(dat_nodeedge)
  
  visNetwork(nodes = data_nodeedge$nodes, edges = data_nodeedge$edges, height = "800px")%>% visNodes(font=list(size=20))

Application on the dataset of ovarian cancer

This uses a set of public ovarian cancer data in the curatedOvarianData R package. See “??curatedOvarianData” in the RStudio console for more help with this dataset: “This package represents a manually curated data collection for gene expression meta-analysis of patients with ovarian cancer. This resource provides uniformly prepared microarray data with curated and documented clinical metadata. It allows a computational user to efficiently curatedOvarianData identify studies and patient subgroups of interest for analysis and to run such analyses immediately without the challenges posed by harmonizing heterogeneous microarray technologies, study designs, expression data processing methods, and clinical data formats.”

As with the simulation above, begin by installing and loading the devtools package. Use the install_github function to install wActNet from GitHub as shown below. Our package depends on two additional R packages: igraph and BioNet. Additionally, install the curatedOvarianData package containing the data to be used in this application example.

Because this is full-transcriptome data, we will load the data and select only a subset of it.

Install and load packages for downloading and preprocessing of data and for running wActNet

start.time <- Sys.time()
library(devtools)


# Uncomment these lines to install the affy and curatedOvarianData package 
# Takes time, so re-comment after

# if (!require("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# BiocManager::install(version = "3.19")
# library(BiocManager)

# BiocManager::install("affy")
# BiocManager::install("curatedOvarianData")

#library(Biobase)

#install the newest version of wActNet package
#install_github("yuy113/wActNet", force = TRUE)

library(igraph)


library(affy)
#library(BioNet)
library(dplyr)
library(stringr)
#library(wActNet)
library(curatedOvarianData)

Load and select a subset of the ovarian cancer data.

standardize = function(x){return((x-mean(x))/sd(x))}

data(GSE32062.GPL6480_eset)
lateStage = exprs(GSE32062.GPL6480_eset)

For a straightforward example on a small set of genes, use the chunk below:

# Extract a subset of genes related to ovarian carcinoma 
# based on the Human Phenotype Ontology
# See https://hpo.jax.org/app/browse/term/HP:0025318
library(tidyr)

library(tidyverse)
lateStageSmall = lateStage[c(1680,1681,3027,4564,8930,12243,12245,13694,13695,13701,13979,16082,16875,17980),] %>%
  t() %>%
  data.frame() %>%
  apply(.,2,standardize)

Use the code chunk below to instead randomly select a larger group of genes. Note that wActNet does not handle gene names with certain symbols, for example “///” in the ovarian cancer dataset. Additionally, underscores will not work because this symbol is used later in order to connect two node names. We provide an example workaround below:

# select a random subset of genes to test different sizes
setSize = 500
set.seed(1989)
# remove genes with weird symbols in the names as this messes up wActNet
row.names(lateStage) <- gsub("///", "-", row.names(lateStage))

geneIdx = sample(1:nrow(lateStage), size=setSize,replace=F)
lateStageSmall = lateStage[geneIdx,] %>%
  t() %>%
  data.frame() %>%
  apply(.,2,standardize)
names(lateStageSmall) = colnames(lateStageSmall)
#head(lateStageSmall)

Assigning edge p-values

To generate edge p-values, we use a permutation test on a correlation network. You can vary nsim to make this easier to test, although in practice you’ll want to use a high number.

edge_p_vals = wActNet::pval.perm.corr(dat = lateStageSmall,MatchId = NULL,
               nsim = 100, do.parallel = F, no_cores = 1)

#names(edge_p_vals)

Assigning node p-values

To assign node p-values associated with each covariate, we will regress the expression of each gene against summary grade. The variable summarygrade is a binarized form of the grade of cancer.

lateStagePheno = phenoData(GSE32062.GPL6480_eset)
gradeInfo = data.frame("sample"=row.names(pData(lateStagePheno)),
                       "summarygrade"=pData(lateStagePheno)$summarygrade) 
regressionDataset = lateStageSmall %>% 
  data.frame() %>%
  mutate("sample"=row.names(lateStageSmall)) %>%
  inner_join(gradeInfo,by="sample")

To get node weights, we regress gene expression against summary grade and take the resulting p-values. Note that we take the raw p-values and not the FDR, as the conversion to node scores in wActNet provides FDR control.

# BiocManager::install("limma")

library(limma)

## 
## Attaching package: 'limma'

## The following object is masked from 'package:BiocGenerics':
## 
##     plotMA

design_matrix = model.matrix( ~ factor(summarygrade),data=regressionDataset)
limma_res = regressionDataset %>% select(-c(sample,summarygrade)) %>% t() %>%
  lmFit(object = ., design = design_matrix) %>%
  eBayes()
node_table = limma_res %>%
  topTable(number = Inf) %>%
  data.frame()

## Removing intercept from test coefficients

node_p_vals = node_table$P.Value
names(node_p_vals) = rownames(node_table)

Running wActNet

Converting p-values to node scores

The uniform.beta.node.edge.score function is provided in wActNet for converting the list of p-values to scores, in which a higher score corresponds to a lower p-value and vice-versa. FDR control is implemented such that negative scores correspond to p-values above the FDR threshold and positive scores correspond to p-values that are FDR-significant (i.e., at or below the FDR threshold).

Please note: “Warning: One or both parameters are on the limit of the defined parameter space” indicates that either too many or too few nodes have statistically significant p-values (FDR-adjusted p > 0.05). This suggests that rather than the null uniform-beta mixture distribution, the actual distribution is too close to either the uniform or the beta distribution alone. The user should expand the grid of possible values to be closer to the boundary, using the optim.method, optim.lower, optim.upper, optim.control, and optim.hessian parameters of the uniform.beta.node.edge.score() function.

FDR.node=0.1
FDR.edge=0.05
networkPrep = uniform.beta.node.edge.score(pval.node = node_p_vals, pval.edge = edge_p_vals, FDR.node = FDR.node, FDR.edge = FDR.edge, dat = lateStageSmall, optim.method="Nelder-Mead", optim.lower = -Inf, optim.upper = Inf, optim.control = list(), optim.hessian = FALSE)

hist(node_p_vals, breaks = 50)

#check names of pvals
#names(node_p_vals)
#names(edge_p_vals)

Running module detection

The MultiModuleFind function is used to find active subnetworks. It is an iterative algorithm that finds the largest scoring subnetwork, then removes it from the network and repeats the process. The number of modules ncluster must be prespecified. As above, please note that “Warning: No positive nodes” indicates that the function completed algorithm execution before reaching ncluster and failed to identify any further optimized or active subnetworks. Here, as with the simulated data above, we compare our proposed method based on both node scores and edge scores with Dittrich’s method using node scores only.

#all possible optimized subnetworks in simulated network using Dittrich's method using node scores only
modules.n = MultiModuleFind(network = networkPrep$Network,
                          node.scores = networkPrep$NodeScore,
                          edge.scores = networkPrep$EdgeScore,
                          ncluster = 2,
                          method = "NodeOnly")

## Warning in runFastHeinz(network.after.cluster1, node.scores2): No positive
## nodes

#all possible optimized subnetworks in simulated network using our proposed method based on node scores and edge scores
modules.ne = MultiModuleFind(network = networkPrep$Network,
                          node.scores = networkPrep$NodeScore,
                          edge.scores = networkPrep$EdgeScore,
                          ncluster = 2,
                          method = "NodeEdge")

## Warning in runFastHeinz.e(network.after.cluster1, node.scores2, edge.scores2, :
## No positive nodes

Examine results

modules.ne

## [[1]]
## IGRAPH 8c29a26 UN-- 11 55 -- 
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 8c29a26 (vertex names):
##  [1] SIRPB1      --STAT1    IGKV1OR2.108--PHF21B   CD163       --PODXL   
##  [4] ARFRP1      --SIRPB1   ARFRP1      --STAT1    DCXR        --SIRPB1  
##  [7] DCXR        --STAT1    SIRPB1      --GPR114   STAT1       --GPR114  
## [10] IGKV1OR2.108--SIRPB1   IGKV1OR2.108--STAT1    PHF21B      --SIRPB1  
## [13] PHF21B      --STAT1    SIRPB1      --TNFRSF17 STAT1       --TNFRSF17
## [16] SIRPB1      --TMEM140  STAT1       --TMEM140  PODXL       --SIRPB1  
## [19] PODXL       --STAT1    CD163       --SIRPB1   CD163       --STAT1   
## [22] ARFRP1      --DCXR     ARFRP1      --GPR114  
## + ... omitted several edges

Note that if a null result where the number of nodes and edges are both zero is returned, you may need to either increase the sample size of genes being used or relax the FDR in order to find active subnetworks. Increasing the sample size of genes will give more possible combinations of subnetwork candidates, and relaxing the FDR will allow subnetworks that are less significant to be detected.

This is a list of edges that shows the connected vertices in the active subnetwork that was detected. Note that the number of nodes is 46, and the number of edges is \(46 \choose 2\) = 1035, the maximum number for a network with 46 nodes.

Note that if more than one active subnetwork is detected, that second network’s information can be accessed by replacing the “[[1]]” below with “[[2]]”, and so on for successive subnetworks.

#list of active subnetworks detected using Dittrich's method (node scores only)
modules.n

## [[1]]
## IGRAPH 261ba58 UN-- 10 45 -- 
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 261ba58 (vertex names):
##  [1] ARFRP1      --CD163        ARFRP1      --DCXR        
##  [3] CD163       --DCXR         ARFRP1      --IGKV1OR2.108
##  [5] CD163       --IGKV1OR2.108 DCXR        --IGKV1OR2.108
##  [7] ARFRP1      --PHF21B       CD163       --PHF21B      
##  [9] DCXR        --PHF21B       IGKV1OR2.108--PHF21B      
## [11] ARFRP1      --PODXL        CD163       --PODXL       
## [13] DCXR        --PODXL        IGKV1OR2.108--PODXL       
## [15] PHF21B      --PODXL        ARFRP1      --SIRPB1      
## + ... omitted several edges

#list of active subnetworks detected using our proposed method (node scores and edge scores)
modules.ne

## [[1]]
## IGRAPH 8c29a26 UN-- 11 55 -- 
## + attr: name (v/c), score (v/n), score (e/n), name (e/c)
## + edges from 8c29a26 (vertex names):
##  [1] SIRPB1      --STAT1    IGKV1OR2.108--PHF21B   CD163       --PODXL   
##  [4] ARFRP1      --SIRPB1   ARFRP1      --STAT1    DCXR        --SIRPB1  
##  [7] DCXR        --STAT1    SIRPB1      --GPR114   STAT1       --GPR114  
## [10] IGKV1OR2.108--SIRPB1   IGKV1OR2.108--STAT1    PHF21B      --SIRPB1  
## [13] PHF21B      --STAT1    SIRPB1      --TNFRSF17 STAT1       --TNFRSF17
## [16] SIRPB1      --TMEM140  STAT1       --TMEM140  PODXL       --SIRPB1  
## [19] PODXL       --STAT1    CD163       --SIRPB1   CD163       --STAT1   
## [22] ARFRP1      --DCXR     ARFRP1      --GPR114  
## + ... omitted several edges

Explore network information:

#node scores
V(modules.ne[[1]])$score

##  [1]  0.32868358  0.53678782  0.01308877  2.64488587  2.29957275  1.64071619
##  [7]  1.59621002  0.71437065  1.23938062  1.65049686 -1.05252907

#edge scores
E(modules.ne[[1]])$score

##  [1]  0.1500247  0.1500247 -0.6897466 -2.1920527 -1.7050822 -1.7876402
##  [7] -2.0438967  0.1500247  0.1500247  0.1500247  0.1500247 -0.2116451
## [13]  0.1500247  0.1500247  0.1500247  0.1500247  0.1500247 -0.2116451
## [19] -0.2116451  0.1500247  0.1500247  0.1500247  0.1500247 -2.1978826
## [25] -0.6897466 -2.0438967 -1.8698877 -1.5886505 -2.2149917  0.1500247
## [31] -1.5295179 -1.7615458 -1.1465480  0.1500247  0.1500247  0.1500247
## [37]  0.1500247  0.1500247  0.1500247  0.1500247 -0.2116451  0.1500247
## [43]  0.1500247  0.1500247  0.1500247  0.1500247 -0.2116451 -0.2116451
## [49]  0.1500247  0.1500247  0.1500247  0.1500247  0.1500247 -1.1465480
## [55]  0.1500247

#node degree
igraph::degree(modules.ne[[1]])

##       ARFRP1        CD163         DCXR IGKV1OR2.108       PHF21B        PODXL 
##           10           10           10           10           10           10 
##       SIRPB1        STAT1      TMEM140     TNFRSF17       GPR114 
##           10           10           10           10           10

Visualize Network Results as above

library(visNetwork)
multi.mod.n <- modules.n
dat_node<-multi.mod.n[[1]]
 E(dat_node)$dashes<-ifelse(E(dat_node)$score>0,FALSE,FALSE)
  E(dat_node)$width<-(E(dat_node)$score+1.5)*0.3
  V(dat_node)$color<-ifelse(V(dat_node)$score>0,"#E7298A","grey")
  V(dat_node)$size<-(V(dat_node)$score+11)*0.5
  
  E(dat_node)$color<-ifelse(E(dat_node)$score>0,"#66A61E","grey")
 
  data_node<- toVisNetworkData(dat_node)
  
  visNetwork(nodes = data_node$nodes, edges = data_node$edges, height = "800px")%>% visNodes(font=list(size=20))

multi.mod.ne <- modules.ne
dat_nodeedge<-multi.mod.ne[[1]]
 E(dat_nodeedge)$dashes<-ifelse(E(dat_nodeedge)$score>0,FALSE,FALSE)
  E(dat_nodeedge)$width<-(E(dat_nodeedge)$score+1.5)*0.3
  V(dat_nodeedge)$color<-ifelse(V(dat_nodeedge)$score>0,"#E7298A","grey")
  V(dat_nodeedge)$size<-(V(dat_nodeedge)$score+11)*0.5
  
  E(dat_nodeedge)$color<-ifelse(E(dat_nodeedge)$score>0,"#66A61E","grey")

  
  data_nodeedge <- toVisNetworkData(dat_nodeedge)
  
  visNetwork(nodes = data_nodeedge$nodes, edges = data_nodeedge$edges, height = "800px")%>% visNodes(font=list(size=20))

Runtime dependence on FDR and Number of Genes

end.time <- Sys.time()
time.taken <- end.time - start.time
print(c("FDR node: ", FDR.node))

## [1] "FDR node: " "0.1"

print(c("FDR edge: ", FDR.edge))

## [1] "FDR edge: " "0.05"

print(c("Number of Genes: ", setSize))

## [1] "Number of Genes: " "500"

print(c("Runtime: ", time.taken))

## [1] "Runtime: "        "3.96293711662292"

print(c("Number of Nodes in Active Subnetwork: ", summary(modules.ne)[,1]))

##                                          
## "Number of Nodes in Active Subnetwork: " 
##                                   Length 
##                                     "11"

Selected runtime examples from testing

The two plots below show the impact of the number of genes and the selected FDR values on the runtime. The first plot highlights that the runtime of specific wActNet functions increases as the user increases the number of input genes.

library(tidyverse)
library(ggplot2)

NumGenes <- c(100, 300, 500, 1000)
pval.perm.corr <- c(1, 4, 12, 42)
MultiModuleFind <- c(.1, 2, 5, 196) #minutes

runtime_dat <- as.data.frame(cbind(NumGenes, pval.perm.corr, MultiModuleFind))

runtime_dat %>% pivot_longer(2:3, names_to = "Function") %>% ggplot() + geom_point(mapping = aes(x=NumGenes, y = value, color = Function, size = 3), show.legend = c(color=T,size=F)) + theme(plot.caption = element_text(hjust = 0)) + labs(title = "Runtime vs. Number of Genes for Two wActNet Functions", x = "Number of Genes", y = "Runtime in Minutes", caption = str_wrap("This plot highlights how the runtime of specific wActNet functions increases as the user increases the number of input genes. The `MultiModuleFind` function is an iterative algorithm used to find active subnetworks. The `pval.perm.corr` function performs a permutation test on a correlation network to generate edge p-values.", 100))

FDR <- c(0.05, 0.1, 0.5)
runtimes100genes <- c(36/60, 36/60, 36/60) #seconds/(60s/min) = min
runtimes500genes <- c(15, 21, 48) #minutes

runtime_dat <- as.data.frame(cbind(FDR, runtimes100genes, runtimes500genes))

runtime_dat %>% pivot_longer(2:3, names_to = "Number of Genes") %>% ggplot() + geom_point(mapping = aes(x=FDR, y = value, color = `Number of Genes`, size = 3), show.legend = c(color=T,size=F)) + theme(plot.caption = element_text(hjust = 0)) + labs(title = "Runtime vs. FDR for full script", x = "FDR", y = "Runtime in Minutes", caption = str_wrap("The second plot shows runtimes of a full script, including Dittrich's method and our proposed method, as well as visualizations for both of their respective network outputs. This plot highlights that relaxing the FDR (which allows more nodes to pass the significance threshold, and therefore creates more complex networks) results in longer runtimes.", 100))

Contact Us / Contribute

This package is new, and any and all suggestions are welcomed. You can use GitHub to raise issues, contribute, or communicate with us about the package:

https://github.com/yuy113/wActNet/

References

Dittrich, M., Klau, G., Rosenwald, A., Dandekar, T., Müller, T.: Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics 24(13), 223–231 (2008)

Beisser, D., Klau, G., Dandekar, T., Müller, T., Dittrich, M.: Bionet: an r-package for the functional analysis of biological networks. Bioinformatics 26(8), 1129–1130 (2010)

Ganzfried BF, Riester M, Haibe-Kains B, Risch T, Tyekucheva S, Jazic I, Wang XV, Ahmadifar M, Birrer M, Parmigiani G, Huttenhower C, Waldron L: curatedOvarianData: Clinically Annotated Data for the Ovarian Cancer Transcriptome. Database, Volume 2013. doi:10.1093/database/bat013, https://academic.oup.com/database/article/doi/10.1093/database/bat013/330978