程序代写 Replace this text with appropriate bibliographic information.

Replace this text with appropriate bibliographic information.
Buravlev et al., Tuple spaces implementationsand their efficiency. https://arxiv.org/pdf/1612.02979.pdf

Tuple spaces implementations and their efficiency

, Nicola, Mezzina
IMT School for Advanced Studies Lucca, . Francesco, 19, 55100 Lucca, Italy
Among the paradigms for parallel and distributed computing, the one popularized with Linda, and based on tuple spaces, is one of the least used, despite the fact of being intuitive, easy to understand and to use. A tuple space is a repository, where processes can add, withdraw or read tuples by means of atomic operations. Tuples may contain different values, and processes can inspect their content via pattern matching. The lack of a reference implementation for this paradigm has prevented its widespread. In this paper, first we perform an extensive analysis of a number of actual implementations of the tuple space paradigm and summarise their main features. Then, we select four such implementations and compare their performances on four different case studies that aim at stressing different aspects of computing such as communication, data manipulation, and cpu usage. After reasoning on strengths and weaknesses of the four implementations, we conclude with some recommendations for future work towards building an effective implementation of the tuple space paradigm.
1. INTRODUCTION
Distributed computing is getting increasingly pervasive, with demands from various application domains and with highly diverse underlying architectures that range from the multitude of tiny devices to the very large cloud-based systems. Several paradigms for programming parallel and distributed computing have been proposed so far. Among them we can list: distributed shared memory [28] (with shared objects and tuple spaces [20] built on it) remote procedure call (RPC [7]), remote method invocation (RMI [30]) and message passing [1] (with actors [4] and MPI [5] based on it). Nowadays, the most used paradigm seems to be message passing while the least popular one seems to be the one based on tuple spaces that was proposed by for the Linda coordination model [19].
As the name suggests, message passing permits coordination by allowing exchanges of messages among distributed processes, with message delivery often mediated via brokers. In its simplest incarnation, message-passing provides a rather low-level programming abstraction for building distributed systems. Linda instead provides a higher level of abstraction by defining operations for synchronization and exchange of values between different programs that can share information by accessing common repositories named tuple spaces. The Linda interaction model provides time and space decoupling [18], since tuple producers and consumers do not need to know each other.
The key ingredient of Linda is a small number of basic operations which can be embedded into different programming languages to enrich them with communication and synchronization facilities. Three atomic operations are used for writing (out), withdrawing (in), and reading (rd) tuples into/from a tuple space. Another operation eval is used to spawn new processes. The operations for reading and withdrawing select tuples via pattern-matching and block if
arXiv:1612.02979v1 [cs.DC] 9 Dec 2016

out(⟨“goofy”, 4, 10.4⟩) ⟨“goofy”, 4, 10.4⟩
in(⟨10, x⟩)
⟨3.4, ⟨…⟩⟩
⟨“donald”, 6, 5.0⟩
⟨10, ⟨…⟩⟩
rd(⟨“goofy”, , ⟩) Figure 1. A tuple space
the wanted data are not available. Writing is instead performed by asynchronous output of the information for interacting partners. Figure 1 illustrates an example of a tuples space with different, structured, values. For example tuple ⟨“goofy”,4,10.4⟩ is produced by a process via the out(⟨“goofy”, 4, 10.4⟩) operation, and is read by the operation rd(⟨“goofy”, , ⟩) after pattern-matching: that is the process reads any tuple of three elements whose first one is exactly the string “goofy”. Moreover, tuple ⟨10,⟨…⟩⟩ is consumed (atomically retracted) by operation in(⟨10,x⟩) which consumes a tuple whose first element is 10 and binds its second element (whatever it is) to the variable x. Patterns are sometimes called templates.
The simplicity of this coordination model makes it very intuitive and easy to use. Some synchronization primitives, e.g. semaphores or barrier synchronization, can be implemented easily in Linda (cf. [10], Chapter 3). Unfortunately, Linda’s implementations of tuple spaces have turned out to be quite inefficient, and this has led researchers to opt for different approaches such OpenMP or MPI, which are nowadays offered, as libraries, for many programming languages. When considering distributed applications, the limited use of the Linda coordination model is also due to the need of guaranteeing consistency of different tuple spaces. In fact, in this case, control mechanisms that can significantly affect scalability are needed [12].
In our view, tuple spaces can be effectively exploited as a basis for the broad range of the distributed applications with different domains (from lightweight applications to large cloud based systems). However, in order to be effective, we need to take into account that performances of a tuple space system may vary depending on the system architecture and on the type of interaction between its components. Although the concept of tuple spaces is rather simple, the main challenge to face when implementing it is to devise the best data structure to deal with a possibly distributed multiset of tuples, where operations on it (e.g. pattern- matching, insertion and removal) are optimized. Moreover, it has to support efficient parallel tuples’ processing and data distribution. Depending on how these aspects are implemented, performances of an application can be positively or negatively affected.
The aim of this paper is to examine the current implementations of tuple spaces and to evaluate their strengths and weaknesses. We plan to use this information as directions for the building more efficient implementation of distributed tuple space.

We start by cataloging the existing implementations according to their features, then we focus on the most recent Linda based systems that are still maintained while paying specific attention to those offering decentralized tuples space. We compare the performances of the selected systems on four different case studies that aim at stressing different aspects of computing such as communication, data manipulation, and cpu usage. After reasoning on strength and weakness of the four implementations, we conclude with some recommendations for future work towards building an effective implementation of the tuple space paradigm.
The rest of the paper is organized as follows. In Section 2 we survey existing tuple spaces systems and choose some of them for the practical examination. The description of case studies, main principles of their implementation, and the results of the experiments are presented in Section 3. Section 4 concludes the paper by collecting some remarks and highlighting some directions for future work. This paper is a revised and extended version of [8]; it contains an additional case study, the thorough evaluation of a new tuple space system and more extensive experiments.
2. TUPLE SPACE SYSTEMS
Since the first publication on Linda [20], there have been a plenty of implementations of its coordination model in different languages. Our purpose is to review the most significant and recent ones, that are possibly still maintained, avoiding toy implementations or the one shot paper implementations. To this end, we have chosen: JavaSpaces [26] and TSpaces [24] which are two industrial proposals of tuple spaces for Java; GigaSpaces [21] which is a commercial implementation of tuple spaces; Tupleware [3] featuring an adaptive search mechanism based on communication history; Grinda [9], Blossom [33], DTuples [22] featuring distributed tuple spaces; LuaTS [23] which mixes reactive models with tuple spaces; Klaim [15] and MozartSpaces [14] which are two academic implementations with a good record of research papers based on them.
In this Section, first we review the above mentioned tuple space systems by briefly describing each of them, and single out the main features of their implementations, then we summarise these features in Table I. Later, we focus on the implementations that enjoy the characteristics we consider important for a tuple space implementation: code mobility, distribution of tuples and flexible tuples manipulation. All tuple space systems are enumerated in order they were first mentioned in publications.
Blossom. Blossom [33] is a C++ implementation of Linda which was developed to achieve high performance and correctness of the programs using the Linda model. In Blossom all tuple spaces are homogeneous with a predefined structure that demands less time for type comparison during the tuple lookup. Blossom was designed as a distributed tuple space and can be considered as a distributed hash table. To improve scalability each tuple can be assigned to a particular place (a machine or a processor) on the basis of its values. The selection of the correspondence of the tuple and the machine is based on the following condition: for every tuple the field access pattern is defined, that determines which fields always contain value (also for templates); values of these fields can be hashed to obtain a number which determines the place where a tuple has to be stored. Conversely, using the data from the template, it is possible to find the exact place where a required tuple is potentially stored. Prefetching allows a process to send an asynchronous (i.e. non-blocking) request for a tuple and to continue its work while the search is performed. When the requested tuple is needed, if found, it is received without waiting.
TSpaces. TSpaces [24] is an implementation of the Linda model developed at the IBM Almaden Research Center. It combines asynchronous messaging with database features. TSpaces provides a transactional support and a mechanism of tuple aging. Moreover, the

embedded mechanism for access control to tuple spaces is based on access permission. It checks whether a client is able to perform specific operations in the specific tuples space. Pattern matching is performed using either standard equals method or compareTo method. It can use also SQL-like query that allows matching tuples regardless of their structure, e.g., ignoring the order in which fields are stored.
Klaim. Klaim [15] (A Kernel Language for Agents Interaction and Mobility) is an extension of Linda supporting distribution and processes mobility. Processes, like any other data, can be moved from one locality to another and can be executed at any locality. Klava [6] is a Java implementation of Klaim that supports multiple tuple spaces and permits operating with explicit localities where processes and tuples are allocated. In this way, several tuples can be grouped and stored in one locality. Moreover, all the operations on tuple spaces are parameterized with a locality. The emphasis is put also on access control which is important for mobile applications. For this reason, Klaim introduces a type system which allows checking whether a process is allowed to perform specific operations at specific localities.
JavaSpaces. JavaSpaces [26] is one of the first implementations of tuple spaces developed by Sun Microsystems. It is based on a number of Java technologies (e.g., Jini and RMI). Like TSpaces, JavaSpaces supports transactions and a mechanism of tuple aging. A tuple, called entry in JavaSpaces, is an instance of a Java class and its fields are the public properties of the class. This means that tuples are restricted to contain only objects and not primitive values. The tuple space is implemented by using a simple Java collection. Pattern matching is performed on the byte-level, and the byte-level comparison of data supports object-oriented polymorphism.
GigaSpaces. GigaSpaces [21] is a contemporary commercial implementation of tuple spaces. Nowadays, the core of this system is GigaSpaces XAP, a scale-out application server; user applications should interact with the server to create and use their own tuple space. The main areas where GigaSpaces is applied are those concerned with big data analytics. Its main features are linear scalability, optimization of RAM usage, synchronization with databases and several database-like features such as complex queries, transactions, and replication.
LuaTS. LuaTS [23] is a reactive event-driven tuple space system written in Lua. Its main features are the associative mechanism of tuple retrieving, fully asynchronous operations and the support of code mobility. LuaTS provides centralized management of the tuple space which can be logically partitioned into several parts using indexing. LuaTS combines the Linda model with the event-driven programming paradigm. This paradigm was chosen to simplify program development since it allows avoiding the use of synchronization mechanisms for tuple retrieval and makes more transparent programming and debugging of multi-thread programs. Tuples can contain any data which can be serialized in Lua. To obtain a more flexible and intelligent search of tuples, processes can send to the server code that once executed returns the matched tuples. The reactive tuple space is implemented as a hash table, in which data are stored along with the information supporting the reactive nature of that tuple space (templates, client addresses, callbacks and so on).
MozartSpaces. MozartSpaces [14] is a Java implementation of the space-based approach [27]. The implementation was initially based on the eXtensible Virtual Shared Memory (XVSM) technology, developed at the Space Based Computing Group, Institute of Computer Languages, Vienna University of Technology. The basic idea of XVSM is related to the concept of coordinator: an object defining how tuples (called entries) are stored. For the retrieval, each coordinator is associated with a selector, an object that defines how entries can be fetched. There are several predefined coordinators such as FIFO, LIFO, Label (each tuple is identified by a label, which can be used to retrieve it), Linda (corresponding to the classic tuple

matching mechanism), Query (search can be performed via a query-like language) and many others. Along with them, a programmer can define a new coordinator or use a combination of different coordinators (e.g. FIFO and Label). MozartSpaces provides also transactional support and a role based access control model [13].
DTuples. DTuples [22] is designed for peer-to-peer networks and based on distributed hash tables (DHT), a scalable and efficient approach. Key features of DHT are autonomy and decentralization. There is no central server and each node of the DHT is in charge of storing a part of the hash table and of keeping routing information about other nodes. As the basis of the DTH’s implementation DTuples uses FreePastry∗. DTuples supports transactions and guarantees fault-tolerance via replication mechanisms. Moreover, it supports multi tuple spaces and allows for two kinds of tuple space: public and subject. A public tuple space is shared among all the processes and all of them can perform any operation on it. A subject tuple space is a private space accessible only by the processes that are bound to it. Any subject space can be bound to several processes and can be removed if no process is bound to it. Due to the two types of tuple spaces, pattern matching is specific for each of them. Templates in the subject tuple space can match tuples in the same subject tuple space and in the common tuple space. However, the templates in the common tuple space cannot match the tuple in the subject tuple spaces.
Grinda. Grinda [9] is a distributed tuple space which was designed for large scale infrastructures. It combines the Linda coordination model with grid architectures aiming at improving the performance of distributed tuple spaces, especially with a lot of tuples. To boost the search of tuples, Grinda utilizes spatial indexing schemes (X-Tree, Pyramid) which are usually used in spatial databases and Geographical Information Systems. Distribution of tuple spaces is based on the grid architecture and implemented using structured P2P network (based on Content Addressable Network and tree based).
Tupleware. Tupleware [3] is specially designed for array-based applications in which an array is decomposed into several parts each of which can be processed in parallel. It aims at developing a scalable distributed tuple space with good performances on a computing cluster and provides simple programming facilities to deal with both distributed and centralized tuple space. The tuple space is implemented as a hashtable, containing pairs consisting of a key and a vector of tuples. Since synchronization lock on Java hashtable is done at the level of the hash element, it is possible to access concurrently to several elements of the table. To speed up the search in the distributed tuple space, the system uses an algorithm based on the history of communication. Its main aim is to minimize the number of communications for tuples retrieval. The algorithm uses success factor, a real number between 0 and 1, expressing the likelihood of the fact that a node can find a tuple in the tuple space of other nodes. Each instance of Tupleware calculates success factor on the basis of previous attempts and first searches tuples in nodes with greater success factor.
In order to compare the implementations of the different variants of Linda that we have considered so far, we have singled out two groups of criteria.
The first group refers to criteria which we consider fundamental for any tuple space system:
eval operation This criterion denotes whether the tuple space system has implemented the eval operation and, therefore, allows using code mobility. It is worth mentioning that the original eval operation was about asynchronous evaluation and not code mobility, but in the scope of a distributed tuple space, it makes programming data manipulation more flexible.
∗ FreePastry is an open-source implementation of Pastry, a substrate for peer-to-peer applications (http://www.freepastry.org/FreePastry/).

Tuple clustering
No domain specificity
Distributed tuple space
Decentralized management
Scalability
JavaSpaces (JSP), TSpaces (TSP), GigaSpaces (GSP), Tupleware (TW), Grinda (GR), Blossom (BL), DTuples (DTP), LuaTS (LTS), Klaim (KL) MozartSpaces (MS)
Table I. Results of the comparison
Tuples clustering This criterion determines whether some tuples are grouped by particular parameters that can be used to determine where to store them in the network.
Absence of domain specificity Many implementations have been developed having a particular application domain in mind. On the one hand, this implies that domain- specific implementations outperform the general purpose one, but on the other hand, this can be considered as a limitation if one aims at generality.
Security This criterion specifies whether an implementation has security features or not. For instance, a tuple space can require an authorization and regulate the access to its tuples, for some of them, the access can be limited to performing specific operations (e.g. only writes or read).
The second group, of criteria, gathers features which are desirable for any fully distributed implementation that runs over a computer network, does not rely on a single node of con

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts