UCI Fault-Tolerant Multicomputer Laboratory


Projects | People | Publications

Parallel and distributed systems possess inherent redundancy that can be exploited to achieve fault tolerance. Our research is concerned with the design and evaluation of ultra-reliable and highly-available parallel and distributed systems. Specific problems studied include on-line fault diagnosis, group membership, recovery techniques, fault-tolerant routing and multicast, clock synchronization, and reconfiguration in multicomputer systems. Evaluation is both analytical and experimental. Testbeds include a UNIX workstation cluster, a Windows NT cluster, and several commercial parallel computers.


Dept. of Electrical & Computer Engineering.