<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<item>
  <id>01766439</id>
  <dt>j</dt>
  <an>01766439</an>
  <augroup>
    <au>Theobald, Kevin B.</au>
    <au>Kumar, Rishi</au>
    <au>Agrawal, Gagan</au>
    <au>Heber, Gerd</au>
    <au>Thulasiram, Ruppa K.</au>
    <au>Gao, Guang R.</au>
  </augroup>
  <ti>Implementation and evaluation of a communication intensive application on the EARTH multithreaded system.</ti>
  <so>Concurrency and Computation: Practice \& Experience 14, No.3, 183-201 (2002).</so>
  <py>2002</py>
  <pu>John Wiley \& Sons, Ltd., Chichester</pu>
  <lagroup>
    <la>EN</la>
  </lagroup>
  <ccgroup>
    <cc>C.m</cc>
  </ccgroup>
  <utgroup>
    <ut>matrix vector multiplication</ut>
  </utgroup>
  <cigroup>
  </cigroup>
  <ligroup>
    <li>doi:10.1002/cpe.604</li>
  </ligroup>
  <abgroup>
    <ab>Summary: This paper reports a study of sparse Matrix Vector Multiplication (MVM) on a parallel computing platform based on a fine-grained multithreaded program execution model. Such sparse MVM computations, when parallelized without performing graph partitioning, suffers a very high communication to computation ratio, and is well known to have a very limited scalability on traditional distributed-memory machines. The particular multithreaded system we use is the Efficient Architecture for Running THreads (EARTH) model, which can be implemented from off-the-shelf processors. With the Class B input sparse matrix from the NAS CG benchmark (75 000 rows), we attain an absolute speedup of 90 on 120 nodes of a distributed memory configuration. This is achieved without using inspector/executor or graph partitioning, or any communication minimization phase, which means that similar results can be expected for adaptive problems as well. High scalability is achieved because of a number of characteristics of the EARTH architecture: local synchronizations, low communication overheads, ability to overlap communication and computation, and low context-switching costs.</ab>
    <rv></rv>
  </abgroup>
</item>