id: 01129899 dt: a an: 01129899 au: Strumpen, Volker; Ramkumar, Balkrishna ti: Portable checkpointing for heterogeneous architectures. so: Avresky, Dimiter R. (ed.) et al., Fault-tolerant parallel and distributed systems. Selected and revised articles at the IEEE workshops, FTPDS ’98, Hawaii, Honolulu 1996 and Geneva, Switzerland, 1997. Boston: Kluwer Academic Publishers. 73-91 (1998). py: 1998 pu: Boston: Kluwer Academic Publishers la: EN cc: ut: checkpointing; compilation techniques ci: li: ab: Summary: Current approaches for checkpointing assume system homogeneity, where checkpointing and recovery are both performed on the same processor architecture and operating system configuration. Sometimes it is desirable or necessary to recover a failed computation on a different processor architecture. For such situations checkpointing and recovery must be portable. We argue that source-to-source compilation is an appropriate concept for this purpose. We describe the compilation techniques developed for the design of the c2ftc prototype, which enables machine-independent checkpoints by automatic generation of checkpointing and recovery code. Sequential C programs are compiled into fault tolerant C programs, whose checkpoints can be migrated across heterogeneous networks, and restarted on binary incompatible architectures. Experimental results on several systems provide evidence that the performance penalty of portable checkpointing is negligible for realistic checkpointing frequencies. rv: