Computing in Large-Scale Dynamic Systems

Pruteanu, A.S.

Computing in Large-Scale Dynamic Systems

Title

Computing in Large-Scale Dynamic Systems

Author

Pruteanu, A.S.

Contributor

Langendoen, K.G. (promotor)
Dulman, S.O. (promotor)

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Embedded Software

Date

2013-05-29

Abstract

Software applications developed for large-scale systems have always been difficult to de- velop due to problems caused by the large number of computing devices involved. Above a certain network size (roughly one hundred), necessary services such as code updating, topol- ogy discovery and data dissemination are challenging to operate and thus difficult to design and implement. This is caused by the sheer size of the system in terms of number of devices that have to be managed. Practical aspects such as harsh and unpredictable deployment con- ditions and varying system properties such as topology dynamics due to device mobility and variations of the communication links cannot be ignored. In this thesis, we investigated how distributed computing techniques can be used for detecting some of the specific properties of large-scale dynamic systems. Additionally, novel online algorithms have been presented for detecting some of the detrimental effects such as unpredictable topology dynamics and random link and device failures. We started our research by focusing on ways to cope with topology dynamics caused by device mobility and corresponding techniques for managing the creation of overlays in mobile networks. The ASH clustering algorithm introduced in Chapter 2 copes well with device mobil- ity and is remarkably resilient to topology dynamics in general. It is able to create quasi-static overlays while the underlying devices are mobile. Unlike traditional clustering schemes, ASH does not use device location, and the computation is based solely on 1-hop information. By making use of local interactions, it benefits from the so-called emergent behaviour in complex networked systems to achieve self-stabilisation. The second challenge that we addressed in Chapter 3 was the design of a distributed es- timator for the aggregate quality of the communication links in a large-scale network. Data dissemination protocols employed in wireless systems can use this information in order to ad- just various application behaviours. We have introduced an algorithm, named LossEstimate, for run-time estimation of the aggregate number of communication failures present in a re- gion or the entire system. Our approach has the advantage of being completely decentralised - each device computes an estimate of the number of errors using a localised, gossip-based algo- rithm. The devised method follows changes in the mean value of the communication failures over time. The algorithm has an important building block - a mechanism called DiffusionRe- set - that is able to track dynamic system properties by continuously re-starting the gossiping process. In Chapter 4 we addressed the problem of computing the number of nodes that are actively entering and exiting a large-scale system, also called churn level. A method that solves this problem can be utilised in traffic monitoring applications. At its core, it uses the DiffusionReset mechanism for being able to offer an estimate at every device in realtime. The current state of the art offers a limited set of alternatives. ChurnDetect is able to cope well also with the effects of device mobility such as dynamic, multi-hop topologies characterised by frequent device and communication failures. The next problem we addressed in Chapter 5 was the detection of a phenomenon called FlashCrowd in peer-to-peer systems. Content-delivery applications benefit from having an on- line estimation of such a phenomena. We specifically focus on the problem of online detection with an algorithm called FlashDetect. To the best of our knowledge, this is one of the first on- line methods for detecting this phenomenon. We use again the DiffusionReset mechanism and a special protocol initialisation for the peers that are entering the system - the initial gossiping values are set to 0. The remarkable property of FlashDetect is that peers do not need to ad- vertise their departure. This makes the algorithm very robust to various device and application failures, unlike competing methods. For networked embedded systems, the ultimate validation of distributed algorithms on real hardware is always preferable. In order to check our algorithms and open-up the possibility for non-IT specialists to test ideas, in Chapter 6 we introduced a software tool chain that enables fast prototyping of algorithms on large-scale systems, by abstracting away from the underlying technological complexity related to communication protocols, programming languages, operat- ing systems, virtual machines and hardware platforms. We made use of the Spatial Computing paradigm to map interactive design applications and distributed algorithms in general onto em- bedded systems middleware. By abstracting away from low-level implementation details of advanced communication protocols and numerous algorithmic building blocks, we offer the possibility for rapid prototyping of ideas. Spatial eLua accelerates thus the work of algorithm designers as well as non-IT specialists interested in making use of the latest technologies in interactive design installations. Our work has lead us to the conclusion that managing large-scale systems is possible via feedback provided by distributed algorithms that estimate various system properties. Ran- domised communication strategies such as gossip-based protocols are very well suited for cre- ating online estimators while being robust to failures. With the help of spatial computing, ideas can be tested and validated easily on various embedded platforms even by non-IT specialists. Even though we made steps in this direction, further effort is required for testing other novel algorithms and their corresponding spatial computing constructs.

Subject

large-scale systems
gossiping algorithms
self-organising systems

To reference this document use:

https://doi.org/10.4233/uuid:0948bda8-b7a2-4bd4-97a5-282b49e77042

Publisher

Sieca Repro

ISBN

9789461861603

Part of collection

Institutional Repository

Document type

doctoral thesis

Rights

Files

PDF

thesis_.pdf

2.88 MB

Close viewer