An Analytical Survey on Diagnosis Algorithm in Generic Defect Large-Scale Failures in Computer Networks

Jaychand Vishwakarma, Sakil Ahmad Ansari


We present a framework and a set of algorithms for determining faults in networks when large scale outages occur. The design principles of our algorithm, netCSI, are motivated by the fact that failures are geographically clustered in such cases. We address the challenge of determining faults with incomplete symptom information due to a limited number of reporting nodes. netCSI consists of two parts: a hypotheses generation algorithm, and a ranking algorithm. When constructing the hypothesis list of potential causes, we make novel use of positive and negative symptoms to improve the precision of the results. In addition, we propose pruning and thresholding along with a dynamic threshold value selector, to reduce the complexity of our algorithm. The ranking algorithm is based on conditional failure probability models that account for the geographic correlation of the network objects in clustered failures. We evaluate the performance of netCSI for networks with both random and realistic topologies. We compare the performance of netCSI with an existing fault diagnosis algorithm, MAX-COVERAGE, and demonstrate an average gain of 128 percent in accuracy for realistic topologies.

Full Text:



R. R. Kompella, J. Yates, A. Greenberg, and A. C. Snoeren, “Detection and localization of network black holes,” in Proc. IEEE 26th Int. Conf. Comput. Commun., 2007, pp. 2180–2188.

S. Kandula and D. Katabi, “Shrink: A tool for failure diagnosis in ip networks,” in Proc. ACM SIGCOMM Workshop Mining Netw. Data, 2005, pp. 173–178.

P. Bahl, R. Ch, A. Greenberg, S. K, D. A. Maltz, and M. Zhang, “Towards highly reliable enterprise network services via inference of multi-level dependencies,” in Proc. Conf. Appl., Technol., Archit., Protocols Comput. Commun., 2007, pp. 13–24.

A. Ogielski and J. Cowie. (2002). Internet routing behavior on 9/11 and in the following weeks. [Online]. Available: http://www. pdf

J. Cowie, A. Popescu, and T. Underwood. (2005). Impact of Hurricane Katrina on Internet infrastructure. [Online]. Available: Katrina-Report-9sep2005.pdf

S. LaPerriere. (2007). Taiwan earthquake fiber cuts: A service provider view. [Online]. Available: meetings/nanog39/presentations/laperriere. pdf

R. R. Kompella, J. Yates, A. Greenberg, and A. C. Snoeren, “Ip fault localization via risk modeling,” in Proc. ACM Symp. Netw. Syst. Des. Implementation, 2005, pp. 57–70.

M. Steinder and A. S. Sethi, “Probabilistic fault diagnosis in communication systems through incremental hypothesis updating,” Comput. Netw., vol. 45, no. 4, pp. 537–562, 2004.

A. Sahoo, K. Kant, and P. Mohapatra, “Improving BGP convergence delay for large-scale failures,” in Proc. Dependable Syst. Netw., 2006, pp. 323–332.

B. Bassiri and S. S. Heydari, “Network survivability in large-scale regional failure scenarios,” in Proc. 2nd Canadian Conf. Comput. Sci. Softw. Eng., 2009, pp. 83–87.

T. Bu, N. Duffield, F. L. Presti, and D. Towsley, “Network tomography on general topologies,” SIGMETRICS Perform. Eval. Rev., vol. 30, pp. 21–30, Jun. 2002.

Y. Chen, D. Bindel, H. Song, and R. H. Katz, “An algebraic approach to practical and scalable overlay network monitoring,” in Proc. Conf. Appl., Technol., Archit. Protocols Comput. Commun., 2004, pp. 55–66.

N. Duffield, “Network tomography of binary network performance characteristics,” IEEE Trans. Inform. Theory, vol. 52, no. 12, pp. 5373–5388, Dec. 2006.

Y. Zhao, Y. Chen, and D. Bindel, “Towards unbiased end-to-end network diagnosis,” in ACM Proc. Conf. Appl., Technol., Archit. Protocols Comput. Commun., 2006, pp. 219–230.

V. Padmanabhan, L. Qiu, and H. Wang, “Server-based inference of internet link lossiness,” in Proc. 22nd Annu. Joint Conf. IEEE Comput. Commun. IEEE Soc., vol. 1, 2003, pp. 145–155, vol.1.

J. Cao, D. Davis, S. V. Wiel, B. Yu, S. Vander, and W. B. Yu, “Timevarying network tomography: Router link data,” J. Am. Statist. Assoc., vol. 95, pp. 1063–1075, 2000.

A. Tsang, M. Coates, and R. D. Nowak, “Network delay tomography,” IEEE Trans. Signal Process., vol. 51, pp. 2125–2136, Aug. 2003.

Y. Huang, N. Feamster, and R. Teixeira, “Practical issues with using network tomography for fault diagnosis,” SIGCOMM Comput. Commun. Rev., vol. 38, pp. 53–58, 2008.

D. Ghita, K. Argyraki, and P. Thiran, “Network tomography on correlated links,” in Proc. 10th ACM SIGCOMM Conf. Internet Meas., 2010, pp. 225–238.

B. W. Silverman, Density Estimation for Statistics and Data Analysis. Boca Raton, FL, USA: CRC Press, 1986.

S. Tati, S. Rager, and T. La Porta, “netCSI: A generic fault diagnosis algorithm for computer networks,” Netw. Secur. Res. Center, Penn State Univ., Pennsylvania, USA, Tech. Rep. NAS-TR-01302010, Jun. 2010.

D. Magoni, “nem: A software for network topology analysis and modeling,” in Proc. 10th IEEE Int. Symp. Model., Anal. Simul. Comput. Telecommun. Syst., 2002, pp. 364–371.

(2003). Rocketfuel project: Internet topologies. [Online]. Available: rocketfuel/

B. G. Mirkin, Mathematical Classification and Clustering. New York, NY, USA: Springer, 1996.

S. Tati, B.-J. Ko, G. Cao, A. Swami, and T. L. Porta, “Adaptive algorithms for diagnosing large-scale failures in computer networks,” IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 3, pp. 646–656, Mar. 2015.



  • There are currently no refbacks.

© International Journals of Advanced Research in Computer Science and Software Engineering (IJARCSSE)| All Rights Reserved | Powered by Advance Academic Publisher.