Yale University Gerstein Lab

Positive Selection at the Protein Network Periphery: Evaluation in Terms of Structural Constraints and Cellular Context

Philip M. Kim*, Jan O. Korbel* and Mark B. Gerstein

To Whom correspondence should be addressed: p.kim(a)yale.edu, jan.korbel(a)yale.edu, mark.gerstein(a)yale.edu


Abstract:

Because of recent advances in genotyping and sequencing, human genetic variation and adaptive evolution in the primate lineage have become major research foci. Here we examine the relationship between genetic signatures of adaptive evolution and network topology. We find a striking tendency of proteins that have been under positive selection (as compared to the chimpanzee) to be located at the periphery of the interaction network. Our results are based on the analysis of two types of genome evolution, both in terms of intra- and inter-species variation. First, we looked at Single-Nucleotide Polymorphisms and their fixed variants, single-nucleotide differences in the human genome relative to the chimpanzee genome. Second, we examine fixed structural variants, specifically large segmental duplications and their polymorphic precursors known as Copy Number Variants. We propose two complementary mechanisms that lead to the observed trends. Firstly, we can rationalize them in terms of constraints imposed by protein structure: We find positively selected sites are preferentially located on the surface of proteins. Since central network proteins (hubs) are likely to have a larger fraction of their surface involved in interactions, they tend to be constrained and under negative selection. Conversely, we show that the interaction network roughly maps to cellular periphery (i.e., extracellular space or cell membrane). This suggests that the observed positive selection at the network periphery may be due to an increase of adaptive events on the cellular periphery responding to changing environments.


Supplementary Data Files:

Each data file has an info file for annotation of the columns.

Download all supplementary data here.

1. Gene information (Gene length, SNPs etc.): data, info

2. Different estimations for positive selection: data, info

3. Segmental Duplications intersecting with the genes: data, info

4. Network centrality statistics (Degree and Betweenness) for the protein products: data, info

5. Intersecting CNVs: data, info


Last modified on Dec. 2nd, 2007