Supporting data for Mammalian GGI network evolution

1. Gene-Gene interaction (GGI) datasets:

In this study, we exploited four types of GGI datasets: 1) Human PPI dataset from HIPPIE; 2) Human PPI dataset from HPRD; 3) Human gene-coexpression data; 4) Mouse PPI dataset.

1) Human PPI dataset from HIPPIE:

Download from webisite OR Download here

Human protein-protein interaction data were extracted and rescored from the October 11, 2013 release of interactions in the Database of Human Integrated Protein-Protein Interaction rEference (HIPPIE), which integrated 18 public protein interaction data sources. Each interaction was assigned a confidence score according to the number and the quality of experimental techniques utilized for the detection of these interactions, and interlog cases in other model organisms. To avoid missing species-specific interactions, the filtering parameter of interlogs in model organisms was omitted. A medium confidence level (0.68 - the median of score distribution) was set as the threshold, and interactions with confidence scores greater than this cut-off were retrieved. Self-interactions were excluded in this study. To eliminate the bias from arbitrary choice of cut-off, We also reconstructed another human PPI networkwith a stricter threshold of confidence score (0.77).

2) Human PPI dataset from HPRD:

Download from webisite OR Download here

We also utilized another manual curated human PPI dataset – Human Protein Reference Database (HPRD release version 9). Similarly, self-interactions were eliminated from this dataset, and only non-redundant interactions were retained.

3) Human gene-coexpression data from COXPRESDB:

Download from webisite OR Download here

Based on gene expression profiling data of 65 human tissues collected from a public co-expressed gene database (COXPRESdb v5.0), we constructed a human gene co-expression (GC) network by exploring the expression profile associations between pair-wise genes, indicated by Pearson correlation coefficients (PCC). The PCC cut-off was chosen as 0.4.

4) Mouse PPI dataset: Download dataset

Mouse protein-protein interaction data was integrated from five well-collected datasets. The confidence score assignment of each interaction followed that of HIPPIE, except the removal of the filtering parameter of interlogs as aforementioned. The self-interactions were also excluded from the dataset. Similarly, a moderate confidence score (0.68 - the median of score distribution) was set as the threshold to define reliable interaction pairs.


2. Gene age datasets and origination mechanism data:

1) Gene age dataset from Yong et al:

Download from webisite OR Download here

Both human and mouse gene age data were retrieved from an early study by Yong et al. In brief, each protein-coding gene was dated and given branch assignment by inferring the absence and presence of orthologs along the vertebrate phylogenetic tree, based on UCSC syntenic genomic alignment. The origination mechanism information of human young genes (primate-specific genes) was from the same study. Young genes that originated from DNA-level duplication or RNA-level duplication were annotated as duplication-originating genes, otherwise were defined as de novo genes.

2) Gene age dataset from Tomislav et al: Download dataset

Additionally, we also used another human gene origin data based on phylostratigraphic analysis, which assigned human genes with phylogenetic branches from 0 to 18 (The range of numbers was shifted to 1-19 in the main text), based on the absence and presence of orthologs in the genomes through cellular organisms to primate species.


3. Human gene essentiality data:

Download dataset

Essential genes are defined as those genes that are critical for the survival of an organism. In this study, potential gene essential information were collected from four distinct resources: a) genes associating with the most life-threatening diseases, which can cause death prior to puberty, or infertility of individuals; b) combinational essential genes detected from large-scale human diseases cell lines via RNA inference (RNAi) experiments. and a recently emerging technology called CRISPR-Cas9 system; c) functional essential genes collected from independent studies via text-mining methods; d) Orthologous genes of genes that are essential in mouse , detected by gene knock-out experiments. Finally, 1,342 genes that co-exist within two or more above datasets were defined as human essential genes.


4. Human gene expression data:

Download from website OR Download here

The mRNA and protein expression profiling data for human tissues were extracted from the Human Protein Atlas Project(V12), which was launched for systematic exploration of the human proteome. RNA-seq technique was exploited to probe the mRNA expression patterns of 20,315 human genes in 27 tissues, and genes with FPKM (fragments per kilobase of exon per million reads mapped) greater than 1.0 were defined as expressed within specific tissues. Antibody-Based Proteomics was used for profiling the expression of proteins for 16,384 human coding genes in 58 tissues, and only proteins with clear bands detected from western blots within corresponding tissues were defined as expression.



Question or suggestion, please contact: