Hubsm: A Novel Amino Acid Substitution Matrix for Comparing Hub Proteins

Renganayaki G., Achuthsankar S. Nair


Sequence alignment algorithms and  database search methods use BLOSUM and PAM substitution matrices constructed from general proteins. These de facto matrices are not optimal to align sequences accurately, for the proteins with markedly different compositional bias in the amino acid.   In this work, a new amino acid substitution matrix is calculated for the disorder and low complexity rich region of Hub proteins, based on residue characteristics. Insights into the amino acid background frequencies and the substitution scores obtained from the Hubsm unveils the  residue substitution patterns which differs from commonly used scoring matrices .When comparing the Hub protein sequences for detecting homologs,  the use of this Hubsm matrix yields better results than PAM and BLOSUM matrices. Usage of Hubsm matrix can be optimal in database search and for the construction of more accurate sequence alignments of Hub proteins.

Full Text:



Philip M Kim , Andrea Sboner, Yu Xia and Mark Gerstein. “The role of disorder in interaction networks: a structural analysis”. Molecular systems biology 2008, 4 (1), 17.

Barabasi AL, Oltvai ZN. “Network biology: understanding the cell’s functional organization”. Nat Rev Genet 2004, 5: 101–113.

Jeong, H.; Mason, S. P.; Barabasi, A. L.; Oltvai, Z. N. “Lethality and centrality in protein networks”. Nature 2001, 411, 41-42.

Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN. “Flexible nets. The roles of intrinsic disorder in protein interaction networks”. FEBS J 2005 272: 5129–5148.

Wright PE, Dyson HJ. “Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm”. J Mol Biol 1999, 293:321–331.

Xie H, Vucetic S, Iakoucheva LM, Oldfield CJ, Dunker AK, Uversky VN, Obradovic Z. “Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions”. J Proteome Res, 2007, 6: 1882–1898.

Iakoucheva LM, Radivojac P, Brown CJ, O’Connor TR, Sikes JG, Obradovic Z, Dunker AK. “The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 2004, 32: 1037–1049.

Beltrao P, Serrano L. “Comparative genomics and disorder prediction identify biologically relevant SH3 protein interactions”.PLoS Comput Biol 2005, 1: e26.

Fuxreiter M, Tompa P, Simon I. “Local structural disorder imparts plasticity on linear motifs”. Bioinformatics 2007, 23: 950–956.

Chad Haynes, Christopher J Oldfield, Fei Ji, Niels Klitgord, Michael E Cusick, Predrag Radivojac, Vladimir N Uversky, Marc Vidal, Lilia M Iakoucheva. “Intrinsic Disorder Is a Common Feature of Hub Proteins from Four Eukaryotic Interactomes”. PLoS Computational Biology 2006, 2(8): e100.

Dunker AK, Obradovic Z, Romero P, Garner EC, Brown CJ. “Intrinsic protein disorder in complete genomes”. Genome Inform (2000), 11: 161–171.

Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT. “Prediction and functional analysis of native disorder in proteins from the three kingdoms of life”. J Mol Biol 2004,337: 635–645.

Wootton, J. C.; Federhen, S. “Statistics of local complexity in amino acid sequences and sequence databases”. Comput. Chem. (Oxford) 1993, 17, 149-163.

Wootton, J. C. “Sequences with “unusual” amino acid compositions”. Curr. Opin. Struct. Biol. 1994, 4, 413-421.

Tompa, P. “Intrinsically unstructured proteins evolve by repeat expansion”. Bioessays 2003, 25, 847-855.

Zsuzsanna Dosztanyi, Jake Chen, A. Keith Dunker, Istva n Simon, and Peter Tompa. “Disorder and Sequence Repeats in Hub Proteins and Their Implications for Network Evolution”.J. Proteome Res., 2006, 5,2985-2995.

Kim PM, Lu L, Xia Y, Gerstein M. “Relating 3D structures to protein networks provides evolutionary insight. Science 2006, 314:1938–1941.

Williams RM, Obradovi Z, Mathura V, Braun W, Garner EC, Young J, Takayama S, Brown CJ, Dunker AK. “The protein non-folding problem: amino acid determinants of intrinsic order and disorder”. Pac Symp Biocomput. 2001, 6, 89-100.

Wootton, J. C.; Federhen, S. “Statistics of local complexity in amino acid sequences and sequence databases”. Comput. Chem. (Oxford) 1993, 17, 149-163.

William R. Pearson. “Selecting the Right Similarity-Scoring Matrix”. Curr Protoc Bioinformatics. 2013; 43: 3.5.1–3.5.9.

Altschul SF, Wootton JC, Gertz EM, Agarwala R, Morgulis A, et al. “Protein Database Searches Using Compositionally Adjusted Substitution Matrices”. FEBS J, 2005, 272: 5101–5109.

Yu YK, Altschul SF. “The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions”. Bioinformatics. 2005;21(7):902–11.

Ng PC, Henikoff JG, Henikoff S. “PHAT: a transmembrane-specific substitution matrix. Predicted hydrophobic and transmembrane”. Bioinformatics.2000;16(9):760–6.

Sutormin RA, Rakhmaninova AB, Gelfand MS. “BATMAS30: amino acid substitution matrix for alignment of bacterial transporters”. Proteins.2003;51(1):85–95.

Predrag Radivojac, Zoran Obradovic, Celeste J. Brown and A. Keith Dunker. “Improving sequence alignments for intrinsically disordered proteins”. Proceedings of Pacific Symposium on Biocomputing, 2002, 589-600.

Rios S, Fernandez MF, Caltabiano G, Campillo M, Pardo L, Gonzalez A. “GPCRtm: An amino acid substitution matrix for the transmembrane region of class A G Protein-Coupled Receptors”. BMC Bioinformatics, 2015,16:206. pmid:26134144.

Finn, RD; Attwood, TK; Babbitt, PC; Bateman, A; Bork, P; Bridge, AJ; Chang, HY; Dosztányi, Z; El-Gebali, S; Fraser, M; Gough, J; Haft, D; Holliday, GL; Huang, H; Huang, X; Letunic, I; Lopez, R; Lu, S; Marchler-Bauer, A; Mi, H; Mistry, J; Natale, DA; Necci, M; Nuka, G; Orengo, CA; Park, Y; Pesseat, S; Piovesan, D; Potter, SC; Rawlings, ND; Redaschi, N; Richardson, L; Rivoire, C; Sangrador-Vegas, A; Sigrist, C; Sillitoe, I; Smithers, B; Squizzato, S; Sutton, G; Thanki, N; Thomas, PD; Tosatto, SC; Wu, CH; Xenarios, I; Yeh, LS; Young, SY; Mitchell, AL. "InterPro in 2017-beyond protein family and domain annotations." Nucleic Acids Res 2017, 45 (D1): D190-D199.

UniProt, Consortium. "UniProt: a hub for protein information". Nucleic acids research. 2015' 43 (Database issue): D204–12.

Kyaw Tun, Raghuraj Keshava Rao, Lakshminarayanan Samavedham, Hiroshi Tanaka and Pawan K. Dhar. Rich can get poor: conversion of hub to non-hub proteins Syst Synth Biol (2008) 2:75–82.

Aswathi Balakrishnan Latha, Achuthsankar Sukumaran Nair, Athmaja Sivasankaran, and Pawan Kumar Dhar. “Identification of hub proteins from sequence”. Bioinformation, 2011; 7(4): 163–168.

Ward, J.J., McGuffin, L. J., Bryson K., Buxton, B. F. & Jones, D.T. “The DISOPRED server for the prediction of protein disorder”. Bioinformatics, 2004, 20, 2138-2139.

Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T, Tramontano A. “Critical assessment of methods of protein structure prediction—Round VII”. Proteins Struct Funct Bioinform 2007, 69 (S8):3–9.

Dayhoff MO, Schwartz RM, Orcutt BC. “A model of evolutionary change in proteins”. Atlas of protein sequence and structure, 1978;5(3):345–51.

Henikoff S, Henikoff JG. “Amino acid substitution matrices from protein blocks”. Proc Natl Acad Sci U S A, 1992;89(22):10915–9.

Gonnet GH, Cohen MA, Benner SA. “Exhaustive matching of the entire protein sequence database”. Science, 1992;256(5062):1443–5.

Chandonia JM, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. “ASTRAL compendium enhancements”. Nucleic Acids Research, 2002, 30;260-263.

Brenner SE, Koehl P, Levitt M. “The ASTRAL compendium for sequence and structure analysis”. Nucleic Acids Research. 2000, 28;254-256.

Smith, Temple F. & Waterman, Michael S. “Identification of Common Molecular Subsequences”. Journal of Molecular Biology, 1981, 147:195–197.

Pearson WR. “Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms”. Genomics, 1991, 11[3]:635–50.

Green RE, Brenner SE. “Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison”. Proc IEEE, 2002, 90[12]:1834–1847.

Price GA, Crooks GE, Green RE, Brenner SE. “Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap”. Bioinformatics, 2005, 21[20]:3824–831.

Romero, P.; Obradovic, Z.; Li, X.; Garner, E. C.; Brown, C. J.;Dunker, A. K. “Sequence complexity of disordered protein”. Proteins, 2001, 42, 38-48.


  • There are currently no refbacks.

© International Journals of Advanced Research in Computer Science and Software Engineering (IJARCSSE)| All Rights Reserved | Powered by Advance Academic Publisher.