Research Supervisor Connect

Healing and Self-Repair in Large Scale Distributed Computing Systems


The project will focus on the development of fault tolerance mechanisms to allow distributed systems to operate under different operating conditions.


Professor Albert Y. Zomaya.

Research location

Computer Science

Program type



As the complexity of distributed systems increases time there will be a need to endow such systems with capabilities that make them capable of operating in disaster scenarios. What makes this problem very complex is the heterogeneous nature of today’s distributed computing environments that could be made up of hundreds or thousands of components (computers, databases, etc). In addition, a user in one location might not be able to have control over other parts of the system. So it is rather logical that there is a need for “smart” algorithms (protocols) that can achieve such an acceptable level of fault-tolerance and account for a variety of disaster recovery scenarios.

Want to find out more?

Opportunity ID

The opportunity ID for this research opportunity is 978

Other opportunities with Professor Albert Y. Zomaya