id: 06104385 dt: a an: 06104385 au: Luszczek, Piotr; Dongarra, Jack ti: Reducing the time to tune parallel dense linear algebra routines with partial execution and performance modeling. so: Wyrzykowski, Roman (ed.) et al., Parallel processing and applied mathematics. 9th international conference, PPAM 2011, Torun, Poland, September 11‒14, 2011. Revised selected papers, Part I. Berlin: Springer (ISBN 978-3-642-31463-6/pbk). Lecture Notes in Computer Science 7203, 730-739 (2012). py: 2012 pu: Berlin: Springer la: EN cc: ut: Linear systems; parallel algorithms; modeling techniques ci: li: doi:10.1007/978-3-642-31464-3_74 ab: Summary: We present a modeling framework to accurately predict time to run dense linear algebra calculation. We report the framework’s accuracy in a number of varied computational environments such as shared memory multicore systems, clusters, and large supercomputing installations with tens of thousands of cores. We also test the accuracy for various algorithms, each of which having a different scaling properties and tolerance to low-bandwidth/high-latency interconnects. The predictive accuracy is very good and on the order of measurement accuracy which makes the method suitable for both dedicated and non-dedicated environments. We also present a practical application of our model to reduce the time required to tune and optimize large parallel runs whose time is dominated by linear algebra computations. We show practical examples of how to apply the methodology to avoid common pitfalls and reduce the influence of measurement errors and the inherent performance variability. rv: