Main Article Content

Abstract

Network-on-chip (NoC) multi-core architectures with a large number of processing elements are becoming a reality with the recent developments in technology. In these modern systems the processing elements are interconnected with regular NoC topologies such as meshes and tori. In this paper we propose a parallel Gauss-Seidel (GS) iterative algorithm for solving large systems of linear equations on a 3-dimensional torus NoC architecture. The proposed parallel algorithm is O(Nn2/k3) time complexity for solving a system with a matrix of order n on a k×k×k 3D torus NoC architecture with N iterations assuming n and N are large compared to k. We show that under these conditions the proposed parallel GS algorithm has near optimal speedup.

 

 

Keywords

Network-on-chip 3D torus Parallel algorithm Linear system of equations Gauss-Seidel method.

Article Details

References

  1. Adams, L. and Xie, D. New Parallel SOR Method by Domain Partitioning, SIAM J. Sci. Comp., 1999, 20(22), 2261-2281.
  2. Krystynak, J. and Nitzberg, B. Performance Characteristics of the iPSC/860 and CM-2 I/O Systems, Proceedings of the Seventh International Parallel Processing Symposium, Newport, CA, 13-16 April 1993, pp. 837-841.
  3. Adams, M.F. A Distributed Memory Unstructured Gauss-Seidel Algorithm for Multigrid Smoothers, Proc. of 2001 ACM/IEEE Conference on Supercomputing.
  4. Hu, C., Zang, J., Wang, J., Li, J. and Ding, L. A New Parallel Gauss-Seidel Method by Iterative Space Alternate Tiling, 16th international Conference on Parallel Architecture and Compilation Techniques, PACT 2007, 15-19 September.
  5. Murugan, M., Sridhar, S. and Arvindam, S. A Parallel Implementation of the Gauss-Seidel Method on the Flosolver, Technical Report, National Aeronautical Labaratory, Bangalor, India, July 2006.
  6. Olszewski, L. A Timing Comparison of the Conjugate Gradient and Gauss-Siedel Parallel Algorithms in a One-dimensional Flow Equation Using PVM, Proc. of the 33rd Annual Southeast Regional Conference, Clemson, South Carolina, 1995, pp. 205-212.
  7. Thongkrajay, U. and Kulworawanichpong, T. Convergence Improvement of Gauss-Seidel Power Flow Solution Using Load Transfer Technique, Proceeding (296) Modeling, Identification, and Control-2008, Feb. 11-13, 2008, Innsbruck, Austria.
  8. Wallin, D., Lof, H., Hagersten, E. and Holmgren, S. Multigrid and Gauss-Seidel Smoothers revisited: Parallelization on Chip Multiprocessors, Proceedings of ICS06 Conference, June 28-30, 2006, Cairns, Queensland, Australia.
  9. Fox, G., Johnson, M., Lyzanga, G., Otto, S., Salmon, J. and Walker, D. Solving Problems on Concurrent Processors, Printice Hall, 1988.
  10. Golub, G. and J.M. Ortega Scientific Computing with an Introduction to Parallel Computing, Academic Press, Boston, MA, 1993.
  11. Saleh, R.A., Gallivan, K.A., Chang, M., Hajj, I.N., Smart, D. and Trich, T.N. Parallel Circuit Simulation on Supercomputers, Proc. of the IEEE, 11989, 77(12), 1915-1930.
  12. Wallch, Y. Calculations and Programs for Power System Networks, Printice Hall, 1986.
  13. Courtecuisse, H. and Allard, J. Parallel Dense Gauss-Seidel Algorithm on Many-Core Processors, High Performance Computation Conference (HPCC), IEEE Press, June 2009.
  14. Wang, F. and Xu, J. A crosswind Block Iterative Method for Convection-Dominated problems, SIAM J. Sci. Comp., 1999, 21(2), 620-645.
  15. Grabel, J., Land, B. and Uebertholz, P. Performance Optimization for the Parallel Gauss-Seidel Smoother, PAMM, 2005, 5(1), 831-832.
  16. Nobuhiko, O., Takeshi, M. Takeshi, I. and Masaaki, S. A Parallel Block Gauss-Seidel Smoother for Algebraic Multigrid Method in Edge-Element Analysis, Papers of Technical meeting on Static Apparatus, IEE Japan, 2006, Vol. 6, no. 58-61, 63-75, pp. 55-60.
  17. Benini, L. and Micheli, G.D. Networks on Chips: Technology and Tools, Morgan Kaufmann, 2006.
  18. Al-Towaiq, M. and Day, K. Parallel Gauss-Seidel on a Torus Network-On-Chip Architecture, J. Interconnection Networks, 2012, 13(1, 2), 1-14.
  19. Day, K. and Al-Towaiq, M. A Parallel Gauss-Seidel Algorithm on a 3D Torus Network-on-Chip Architecture, 10th HiPEAC Conference on High Performance and Embedded Architecture and Compilers, January 19-21, 2015, Amsterdam, Netherlands.
  20. Taylor, M.B., Lee, W., Amarasinghe, S. and Agarwal, A. Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures, Int’l Symposium on High-Performance Computer Architecture (HPCA), 2003, pp. 341-353, Anaheim, California.
  21. Gratz, P., Kim, C., Sankaralingam, K., Hanson, H., Shivakumar, P. and Burger, S.K. On-Chip Interconnection Networks of the Trips Chip, IEEE Micro, 2007, 27(5), 41-50.
  22. Hoskote, Y., Vangal, S.R., Singh, A., Borkar, N. and Borkar, S. A 5-GHZ Mesh Interconnect for a Teraflops Processor, IEEE Micro, 2007, 27(5), 51-61.
  23. Ramey, C. TILE-Gx100 ManyCoreProcessor: Acceleration Interfaces and Architecture, Hot Chips 23, Stanford, CA, August 17, 2011.
  24. Howard , J. et. al, A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling, IEEE Journal of Solid-State Circuits, January 2011, 46(1), 173-183.
  25. Feero, B.S. and Pande, P.P. Networks-on-Chip in a Three-Dimensional Environment: A Performance Evaluation, IEEE Trans. on Computers, January 2009, 58(1), 32-45.
  26. Pavlidis, V.F and Friedman, E.G. 3-D Topologies for Networks-on-Chip, IEEE TVLSI, 2007, 15(10), 1081-1090.
  27. Marcon, C. et. al., Tiny NoC: A 3D Mesh Topology with Router Channel Optimization for Area and Latency Minimization, Proc. 27th International Conference on VLSI Design, 5-9 Jan 2014, Mumbai, India, 2014, pp. 228-233.
  28. Kourdy, R. and Nouri, M.R. Performance Comparison of 2D and 3D Torus Network-on-Chip Architectures, Journal of Computing, 2012, 4(2), 119-122.