Result 1 to 20 of 46 total
High availability on cloud with HA-OSCAR. (English)
Alexander, Michael (ed.) et al., Euro-Par 2011: Parallel processing workshops. CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29‒September 2, 2011. Revised selected papers, Part II. Berlin: Springer (ISBN 978-3-642-29739-7/pbk). Lecture Notes in Computer Science 7156, 292-301 (2012).
1
io-port 06097230 Brandt, J.;
Chen, F.;
Gentile, A.;
Leangsuksun, Chokchai (Box);
Mayo, J.;
Pebay, P.;
Roe, D.;
Taerat, N.;
Thompson, D.;
Wong, M.
Framework for enabling system understanding. (English)
Alexander, Michael (ed.) et al., Euro-Par 2011: Parallel processing workshops. CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29‒September 2, 2011. Revised selected papers, Part II. Berlin: Springer (ISBN 978-3-642-29739-7/pbk). Lecture Notes in Computer Science 7156, 231-240 (2012).
2
Workshop on resiliency in high performance computing (resilience) in clusters, clouds, and grids. (English)
Alexander, Michael (ed.) et al., Euro-Par 2011: Parallel processing workshops. CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29‒September 2, 2011. Revised selected papers, Part II. Berlin: Springer (ISBN 978-3-642-29739-7/pbk). Lecture Notes in Computer Science 7156, 209 (2012).
3
Baler: deterministic, lossless log message clustering tool (English)
Computer Science - R&D 26, No. 3-4, 285-295 (2011).
4
Incremental checkpoint schemes for Weibull failure distribution. (English)
Int. J. Found. Comput. Sci. 21, No. 3, 329-344 (2010).
5
Reliability of a system of k nodes for high performance computing applications (English)
IEEE Transactions on Reliability 59, No. 1, 162-169 (2010).
6
Proficiency metrics for failure prediction in high performance computing (English)
ISPA, 491-498 (2010).
7
Benefits of software rejuvenation on HPC systems (English)
ISPA, 499-506 (2010).
8
VCCP: A transparent, coordinated checkpointing system for virtualization-based cluster computing (English)
CLUSTER, 1-10 (2009).
9
HPC failure prediction proficiency metrics (English)
CLUSTER, 1-4 (2009).
10
Blue gene/L log analysis and time to interrupt estimation (English)
ARES, 173-180 (2009).
11
io-port 71029525 Scott, Stephen L.;
Engelmann, Christian;
Vallée, Geoffroy;
Naughton, Thomas;
Tikotekar, Anand;
Ostrouchov, George;
Leangsuksun, Chokchai;
Naksinehaboon, Nichamon;
Nassar, Raja;
Paun, Mihaela;
Mueller, Frank;
Wang, Chao;
Nagarajan, Arun Babu;
Varma, Jyothish
A tunable holistic resiliency approach for high-performance computing systems (English)
PPOPP, 305-306 (2009).
12
Reliability-aware approach: an incremental checkpoint/restart model in HPC environments (English)
CCGRID, 783-788 (2008).
13
Symmetric active/active high availability for high-performance computing system services: accomplishments and limitations (English)
CCGRID, 813-818 (2008).
14
An optimal checkpoint/restart model for a large scale high performance computing system (English)
IPDPS, 1-9 (2008).
15
Symmetric active/active replication for dependent services (English)
ARES, 260-267 (2008).
16
A framework for proactive fault tolerance (English)
ARES, 659-664 (2008).
17
Transparent symmetric active/active replication for service-level high availability (English)
CCGRID, 755-760 (2007).
18
A reliability-aware approach for an optimal checkpoint/restart model in HPC environments (English)
CLUSTER, 452-457 (2007).
19
Reliability-aware resource allocation in HPC systems (English)
CLUSTER, 312-321 (2007).
20
Result 1 to 20 of 46 total