Submitted on the 8th of July, 2005
Recovery is the last resort for preserving data and system state consistency in a failure-prone environment. Critical applications use transactional database servers whose data recovery mechanisms establish atomic updates and durability of data in the presence of transient system failures. Unfortunately, data recovery on database servers does not enforce an appropriate exception handling in the other application components. It is the responsibility of every single component in the system to handle all system failures such as message losses, timeouts, and crashes in a correct manner. In a distributed application with a rich state some component interdependences are often overlooked, which leads to incorrect application behavior in that some requests may unintentionally be repeated whereas others may not be executed at all due to message losses.
This has motivated several recovery protocols aiming at masking system failures, and so relieving developers from dealing with them. The queued transactions approach has been the most successful industrial solution thus far. It requires that components store their state in transactional input and output message queues mostly residing on a database server, or in a database. In a multi-tier system, a single end-user request incurs a number of instances of the Two-Phase-Commit protocol incurring high logging overhead. Due to an inconvenient programming model and for insufficient scalability in the context of multi-tier applications, queued transactions have not been adopted for Web Services, although most of them are stateful by nature since they require several interactions with the user to accomplish a deal: authentication, catalog search, price negotiation or bidding, and finally committing the deal. This thesis elaborates on a recently proposed framework of interaction contracts geared towards general multi-tier applications that is more efficient than the queued transactions approach, and does not enforce any specific programming style.
This thesis provides for the first time a formal specification for each interaction contract previously only informally described in the original literature. To this end, we adopted the state-and-activity chart language as defined and implemented in the commercial tool Statemate, widely used for modeling reactive systems such as embedded devices in the automotive and airspace industries. Each individual interaction contract is defined by a generic activity that can be easily reused in every application context. We model a complex Web Service comprising several components, which pass messages to each other either in synchronous or asynchronous fashion with the generic interaction contract activities as building blocks. Most importantly, the Web Service model does not involve any recovery actions other than those defined in the underlying interaction contract activities that are invisible at the application layer.
After completing the formal specification process, we start with verification of the interaction contracts using Statemate’s integrated model checker. For this purpose, we formulate crucial safety and liveness properties as temporal logic formulae. As for safety, we show that no message is ever executed more than once. For liveness, we prove that with a finite number of failures each interaction contract eventually terminates, and the corresponding requests are executed exactly once. While the verification of the individual bilateral interaction contracts is straightforward due to their relatively small model size, additional design engineering effort is needed to keep the Web Service model verifiable. We succeed in designing equivalent or more general, verifiable models, whose safety properties carry over into the original specification of the interaction contracts.
Along with the formal specification of the interaction contract framework, in this thesis we describe a prototype Web Service platform called EOS that we built to investigate the framework’s viability in a real-world setting. More specifically, we consider two popular products used in the Web Service context: Microsoft Internet Explorer as a browser (user front-end), and a script engine PHP as a Web application server which can be invoked either by a browser or by another application server. We implement the external interaction contract to handle interactions between an end-user and her browser. Interactions between a pair of Web application servers, and between a browser and a Web application server run under either the committed or the immediately committed interaction contract. To this end, we turned the browser and the Web application server into persistent components by equipping them with logging and recovery routines. In accordance with the framework goals, we achieved this without rewriting existing application programs such as PHP scripts and the browser by solely changing their runtime environment. The most challenging part of this work was providing the deterministic replay of the multi-threaded PHP script engine in the business-to-business context, in which the state is shared by multiple sessions and may be simultaneously accessed by several other application servers. Thus, deterministic replay requires logging of original output messages. Enhanced components exhibit acceptable overhead in comparison with the original implementation, which shows their viability in large-scale Web Services.